浙江农业学报

• 生物系统工程 • 上一篇    下一篇

基于拟氨基酸多重集的DNA序列的数值刻画及其应用

  

  1. (渤海大学 数理学院,辽宁 锦州 121013)
  • 出版日期:2015-07-25 发布日期:2015-08-03

Numerical characterization of DNA sequences based on multisets of pseudo amino acids and its applications#br#

  1. (College of Mathematics and Physics, Bohai University, Jinzhou 121013, China)
  • Online:2015-07-25 Published:2015-08-03

摘要:  利用密码子与氨基酸及终止信号之间的映射关系,提出了DNA序列的拟氨基酸序列。然后,借助多重集,构造了DNA序列的21维的数值向量表示,据此可计算DNA序列之间的相似距离。通过对汉坦病毒S片段全基因序列、番茄黄化曲叶病毒全基因组序列以及人鼻病毒全基因组序列3个数据集的系统发育分析,证明了所提方法的有效性。

关键词: DNA, 拟氨基酸, 多重集, 系统发育分析

Abstract: According to a mapping of codons and amino acids and stop signal, the sequence of pseudo amino acid for DNA sequence was proposed. Then, by means of the multiset, a 21\|dimensional numerical vector of a DNA sequence was constructed. On the basis of the vector, the similarity distance between any two DNA sequences can be calculated. The phylogenetic analysis on three datasets (S segment of hantaviruses, complete genome sequences of Tomato yellow leaf curl virus and complete genome sequences of human rhinovirus) demonstrated the effectiveness of the proposed method.

Key words: DNA, pseudo amino acid, multiset, phylogenetic analysis