Acta Agriculturae Zhejiangensis ›› 2025, Vol. 37 ›› Issue (11): 2387-2394.DOI: 10.3969/j.issn.1004-1524.20241100

• Biosystems Engineering • Previous Articles     Next Articles

Breed recognition of sheep based on SNP and machine learning algorithms

SUN Shuo1,2(), LIU Zhaohua3,4, WANG Ke3,4, ZHENG Jiye1,2,*(), XING Fanbin1,2, SONG Xianxue1,2, WANG Jianying3,4, MENG Xianfeng3,4, YANG Jingchao5, ZHANG Xia1   

  1. 1. Department of Physical Science and Information Engineering, Liaocheng University, Liaocheng 252000, Shandong, China
    2. Institute of Agricultural Information and Economics, Shandong Academy of Agricultural Sciences, Jinan 250100, China
    3. Shandong Key Laboratory of Animal Disease Control and Breeding, Institute of Animal Science and Veterinary Medicine, Shandong Academy of Agricultural Sciences, Jinan 250100, China
    4. Key Laboratory of Livestock and Poultry Genomics, Ministry of Agriculture and Rural Affairs, Jinan 250100, China
    5. Shandong Provincial Animal Husbandry Station, Jinan 250100, China
  • Received:2024-12-20 Online:2025-11-25 Published:2025-12-08

Abstract:

To explore the optimal combination of methods for sheep breed recognition, we systematically compared the breed recognition accuracy of three single nucleotide polymorphism(SNP) screening methods [fixation index (FST), informativeness for assignment (In), and minimum redundancy maximum relevance (mRMR)] and five machine learning algorithms [multilayer perceptron (MLP), extreme gradient boosting (XGBoost), random forest (RF), support vector machine (SVM), and K-nearest neighbor (KNN)] under varying numbers of reference SNPs, using SNPs genotyping data from 256 sheep across 11 breeds after data quality control. The results indicated that in most cases, FST demonstrated the best screening performance, the SVM algorithm showed a clear advantage, and the number of SNPs significantly influenced recognition accuracy. Among all combinations, the SVM algorithm combined with the FST screening method achieved the best performance with 1 400 SNPs, yielding a breed recognition accuracy of 99.54%. These findings contribute to understanding the differences in breed recognition effectiveness under various combinations and provide support for protecting sheep breed diversity, maintaining ecological balance, and improving specific traits.

Key words: single nucleotide polymorphism(SNP), machine learning algorithm, sheep, breed recognition, support vector machine (SVM), fixation index(FST)

CLC Number: