浙江农业学报 ›› 2025, Vol. 37 ›› Issue (11): 2387-2394.DOI: 10.3969/j.issn.1004-1524.20241100

• 生物系统工程 • 上一篇    下一篇

基于SNP与机器学习的羊品种识别算法研究

孙硕1,2(), 刘昭华3,4, 王可3,4, 郑纪业1,2,*(), 邢凡彬1,2, 宋现雪1,2, 王建英3,4, 孟宪锋3,4, 杨景晁5, 张霞1   

  1. 1.聊城大学 物理科学与信息工程学院,山东 聊城 252000
    2.山东省农业科学院 农业信息与经济研究所,山东 济南 250100
    3.山东省农业科学院 畜牧兽医研究所,山东省畜禽疫病防治与繁育重点实验室,山东 济南 250100
    4.农业农村部畜禽生物组学重点实验室,山东 济南 250100
    5.山东省畜牧总站,山东 济南 250100
  • 收稿日期:2024-12-20 出版日期:2025-11-25 发布日期:2025-12-08
  • 作者简介:孙硕(2001—),男,山东青岛人,硕士研究生,研究方向为机器学习在农业中的应用。E-mail:stnc4478@126.com
  • 通讯作者: *郑纪业,E-mail:jiyezheng@163.com
  • 基金资助:
    山东省科技型中小企业创新能力提升工程计划(2024TSGC0082);山东省农业良种工程(2021LGGC010);山东省中央引导地方科技发展资金项目(YDZX2023131)

Breed recognition of sheep based on SNP and machine learning algorithms

SUN Shuo1,2(), LIU Zhaohua3,4, WANG Ke3,4, ZHENG Jiye1,2,*(), XING Fanbin1,2, SONG Xianxue1,2, WANG Jianying3,4, MENG Xianfeng3,4, YANG Jingchao5, ZHANG Xia1   

  1. 1. Department of Physical Science and Information Engineering, Liaocheng University, Liaocheng 252000, Shandong, China
    2. Institute of Agricultural Information and Economics, Shandong Academy of Agricultural Sciences, Jinan 250100, China
    3. Shandong Key Laboratory of Animal Disease Control and Breeding, Institute of Animal Science and Veterinary Medicine, Shandong Academy of Agricultural Sciences, Jinan 250100, China
    4. Key Laboratory of Livestock and Poultry Genomics, Ministry of Agriculture and Rural Affairs, Jinan 250100, China
    5. Shandong Provincial Animal Husbandry Station, Jinan 250100, China
  • Received:2024-12-20 Online:2025-11-25 Published:2025-12-08

摘要: 为确定最优的羊品种识别方法组合,使用11个品种共256头羊的单核苷酸多态性(single nucleotide polymorphism, SNP)基因分型数据,经数据质控后,系统比较3种SNP筛选方法[群体分化指数(fixation index, FST)、赋值信息度(informativeness for assignment, In)和最小冗余最大相关(minimum redundancy maximum relevance, mRMR)]和5种机器学习算法[多层感知器(multilayer perceptron, MLP)、极限梯度提升(extreme gradient boosting, XGBoost)、随机森林(random forest, RF)、支持向量机(support vector machine, SVM)和K最邻近法(K-nearest neighbor, KNN)]在不同参考SNP数量条件下的品种识别准确率。结果表明,多数情况下,FST筛选效果最佳,SVM算法优势明显,SNP数量对识别准确率影响显著。所有组合中,SVM算法结合FST筛选方法,在SNP数量为1 400时效果最佳,品种识别准确率达99.54%。该研究结果有助于理解不同组合下品种识别效果的差异,为保护羊品种多样性和改良特定性状提供帮助。

关键词: 单核苷酸多态性(SNP), 机器学习, 羊, 品种识别, 支持向量机(SVM), 群体分化指数(FST)

Abstract:

To explore the optimal combination of methods for sheep breed recognition, we systematically compared the breed recognition accuracy of three single nucleotide polymorphism(SNP) screening methods [fixation index (FST), informativeness for assignment (In), and minimum redundancy maximum relevance (mRMR)] and five machine learning algorithms [multilayer perceptron (MLP), extreme gradient boosting (XGBoost), random forest (RF), support vector machine (SVM), and K-nearest neighbor (KNN)] under varying numbers of reference SNPs, using SNPs genotyping data from 256 sheep across 11 breeds after data quality control. The results indicated that in most cases, FST demonstrated the best screening performance, the SVM algorithm showed a clear advantage, and the number of SNPs significantly influenced recognition accuracy. Among all combinations, the SVM algorithm combined with the FST screening method achieved the best performance with 1 400 SNPs, yielding a breed recognition accuracy of 99.54%. These findings contribute to understanding the differences in breed recognition effectiveness under various combinations and provide support for protecting sheep breed diversity, maintaining ecological balance, and improving specific traits.

Key words: single nucleotide polymorphism(SNP), machine learning algorithm, sheep, breed recognition, support vector machine (SVM), fixation index(FST)

中图分类号: