浙江农业学报

• 食品科学 • 上一篇    下一篇

蛋白质序列基于k\|字的数值刻画及应用

  

  1. (渤海大学 数理学院,辽宁 锦州 121013)
  • 出版日期:2014-11-25 发布日期:2014-12-02

A numerical characterization of protein sequences based on k\|word and its application

  1. (College of Mathematics and Physics, Bohai University, Jinzhou 121013, China)
  • Online:2014-11-25 Published:2014-12-02

摘要: 基于氨基酸的一种5\|字母分类模型,将蛋白质序列转化为5\|字母序列,再借助序列中1\|字和2\|字的频数,将序列转化为一个30维的向量。通过计算两两向量间的欧氏距离得到物种间的进化距离,进而对两组蛋白质序列进行系统发生分析,结果证实了该方法的有效性。

关键词: 5\, 字母分类模型, 30维向量, 系统发生分析

Abstract: Based on 5\|letter classification model of amino acids, a protein sequence was transformed into a 5\|letter sequence. By means of the frequencies of 1\|word and 2\|word, the sequence is transformed into a 30\|D vector. By calculating the Euclidean distance between two vectors, we obtained the evolutionary distance between two species. The phylogenetic analysis on two groups of protein sequences showed that this method was efficient.

Key words: 5\, letter classification, 30\, D vector, phylogenetic analysis