Lightweight and improved apple orchard fruit recognition model CS_YOLOv7

doi:10.3969/j.issn.1004-1524.20250100

Acta Agriculturae Zhejiangensis ›› 2026, Vol. 38 ›› Issue (2): 383-396.DOI: 10.3969/j.issn.1004-1524.20250100

• Biosystems Engineering • Previous Articles Next Articles

Lightweight and improved apple orchard fruit recognition model CS_YOLOv7

OUYANG Yu(), LIU Shuo(), LI Mengmin, ZHANG Peng

School of Mathematics & Computer Science, Wuhan Polytechnic University, Wuhan 430048, China

Received:2025-02-10 Online:2026-02-25 Published:2026-03-24

Abstract

Abstract:

Aiming at the problems faced by current fruit recognition in apple orchards, such as excessive model parameter scale, high computational resource consumption, and difficulty in achieving a good balance between model detection accuracy and speed, a lightweight improved model CS_YOLOv7 based on YOLOv7 was proposed. Firstly, the channel-split efficient layer aggregation network (CS_ELAN) and the spatial pyramid pooling fast (SPPF) module were introduced into the model to achieve overall lightweighting of the model. Secondly, the K-means++algorithm was adopted to generate new anchor boxes suitable for the dataset in this study, so as to enhance the model’s target localization capability. Thirdly, the Wise-IoU loss function was used to replace the original loss function, which reduced the harmful gradients of low-quality samples and improved the model convergence speed and target recognition localization accuracy. Finally, an attention mechanism SE_CBAM based on spatial and channel dimensions was added to enable the model to extract key features of small apple targets from a more global perspective. The results showed that, compared with the original YOLOv7 model, the improved model achieved a 1.7 percentage points increase in the mean average precision under the intersection over union of 0.5 (mAP@0.5), a reduction of 22.3 MB in model size, and an improvement of 118.9 frames·s^-1 in detection speed. Meanwhile, the number of model parameters and computational complexity decreased by 31.8% and 16.1%, respectively. The CS_YOLOv7 model achieves multi-dimensional lightweighting while optimizing accuracy, which can be applied to the rapid recognition of young fruits in orchard datasets, and lays a foundation for efficient real-time target recognition and subsequent robotic picking in the future.

Key words: apple orchard, fruit recognition, machine vision, model optimization

CLC Number:

S24
S661.1

OUYANG Yu, LIU Shuo, LI Mengmin, ZHANG Peng. Lightweight and improved apple orchard fruit recognition model CS_YOLOv7[J]. Acta Agriculturae Zhejiangensis, 2026, 38(2): 383-396.

Figures/Tables 14

Fig.1 Examples of dataset augmentation a, Original image; b, Random cropping; c, Random translation; d, Random flipping plus Gaussian noise; e, Random rotation plus random brightness.

Fig.2 Examples of images under different conditions a, Backlighting; b, Frontlighting; c, Occlusion by fruit leaves; d, Occlusion by lampposts; e, Early morning sunrise scene; f, Midday strong sunlight scene; g, Sparse fruit scene; h, Dense fruit scene.

Fig.3 Overall structure of CS-YOLOv7 network model cat, Concat; Conv, Convolution; MaxPool, Max pooling; BN, Batch normalization; SiLU, Sigmoid linear unit; REP, Re-parameterized convolution; SE_CBAM, Squeeze-and-excitation_convolutional block attention module. The same as below.

Fig.4 Structure of CS_ELAN

Table 1 Reunion of anchor box

检测层大小 Detection layer size	先验框尺寸Anchor box size
检测层大小 Detection layer size	调整前Before adjustment	调整后After adjustment
80×80	[12, 16], [19, 36], [40, 28]	[9, 12], [15, 29], [32, 23]
40×40	[36, 75], [76, 55], [72, 146]	[29, 60], [63, 44], [59, 119]
20×20	[142, 110], [192, 243], [459, 401]	[171, 132], [231, 292], [550, 483]

Fig.5 Structure of SE_CBAM module a, Squeeze-and-excitation channel attention module; b, Spatial attention module; c, Convolutional block attention module.

Table 2 Comparison of performance before and after the prior box reunion

模型Model	P	R	mAP@0.5
Y	73.5	67.4	73.4
G	73.5	69.3	74.5

Fig.6 Comparison of iteration curves before and after the prior box reunion mAP@0.5,Mean average precision under intersection over union (IoU) of 0.5. The same as below. Y represents the model with the original anchor boxes, and G represents the model with the improved anchor boxes.

Table 3 Comparison of model performance under different loss functions

损失函数 Loss function	P/%	R/%	mAP@0.5/%	v/(frame·s^-1)
CIoU	73.5	69.3	74.5	106.3
EIoU	67.3	67.3	70.2	166.6
SIoU	68.3	59.3	65.0	169.4
MPDIoU	79.2	63.9	74.5	147.0
Wise-IoU	80.8	68.7	79.1	161.3

Fig.7 Iteration curves under different loss functions

Table 4 Results of ablation experiments

编号 No.	改进情况Improvement status				N	v/ (frame·s^-1)	mAP@0.5/%
编号 No.	CS_ELAN+SPPF	K-means++	Wise-IoU	SE_CBAM	N	v/ (frame·s^-1)	mAP@0.5/%
1	×	×	×	×	37 196 556	32.6	78.3
2	√	×	×	×	25 226 060	166.6	73.4
3	√	√	×	×	25 226 060	106.3	74.5
4	√	×	√	×	25 226 060	166.6	71.3
5	√	√	√	×	25 226 060	161.3	79.1
6	√	√	√	√	25 367 426	151.5	80.0

Table 5 Comparison of target detection effects of different models

模型	N	FLOPS/10⁹	v/(frame·s^-1)	mAP@0.5/%	S/MB
SSD	24 386 000	87.5	82.3	65.2	100.0
Faster R-CNN	41 753 000	83.8	55.0	65.7	314.0
RetinaNet	32 201 069	127.2	50.0	76.1	245.0
YOLOv4	63 937 686	170.0	70.0	71.6	244.0
YOLOv5x	87 244 374	217.0	51.0	81.3	1 000.0
YOLOv7	37 196 556	105.0	32.6	78.3	71.0
YOLOv8	25 902 640	79.3	101.9	78.4	49.6
YOLOv9	25 590 912	104.0	79.2	79.5	49.2
Tiny-YOLO	6 014 988	13.2	588.2	72.7	6.3
MobileNet-YOLO	24 616 556	41.3	161.0	78.1	49.7
YOLOv12	26 454 880	89.7	107.5	80.2	53.5
CS_YOLOv7	25 367 426	88.1	151.5	80.0	48.7

Fig.8 Identification effect of young apple fruit under different lighting conditions by different models The purple circles in the picture indicate missed or false detections of young apple fruits. The scenes “Early morning sunrise” and “Midday strong sunlight” were photographed at the same location of the fruit tree at different times.

Table 6 Comparison of detection effects on different datasets by different models

数据集 Dataset	模型 Model	P	R	mAP@0.5
子集1	YOLOv7	76.0	78.0	74.0
Subset	MobileNet-YOLO	84.6	73.4	70.0
	YOLOv12	96.0	61.3	77.4
	CS_YOLOv7	78.6	76.7	77.8
子集2	YOLOv7	85.4	71.4	67.8
Subset 2	MobileNet-YOLO	92.0	69.1	69.2
	YOLOv12	89.5	60.4	76.9
	CS_YOLOv7	82.7	77.7	74.1

References 27

[1]	李寒, 陶涵虓, 崔立昊, 等. 基于SOM-K-means算法的番茄果实识别与定位方法[J]. 农业机械学报, 2021, 52(1): 23-29.
	LI H, TAO H X, CUI L H, et al. Recognition and localization method of tomato based on SOM-K-means algorithm[J]. Transactions of the Chinese Society for Agricultural Machinery, 2021, 52(1): 23-29.
[2]	张平川, 胡彦军, 张烨, 等. 基于改进版Faster-RCNN的复杂背景下桃树黄叶病识别研究[J]. 中国农机化学报, 2024, 45(3): 219-225.
	ZHANG P C, HU Y J, ZHANG Y, et al. Recognition of peach tree yellow leaf disease under complex background based on improved Faster-RCNN[J]. Journal of Chinese Agricultural Mechanization, 2024, 45(3): 219-225.
[3]	石展鲲, 杨风, 韩建宁, 等. 基于Faster-RCNN的自然环境下苹果识别[J]. 计算机与现代化, 2023(2): 62-65.
	SHI Z K, YANG F, HAN J N, et al. Apples recognition in natural environment based on Faster-RCNN[J]. Computer and Modernization, 2023(2): 62-65.
[4]	司永胜, 孔德浩, 王克俭, 等. 基于CRV-YOLO的苹果中心花和边花识别方法[J]. 农业机械学报, 2024, 55(2): 278-286.
	SI Y S, KONG D H, WANG K J, et al. Recognition of apple king flower and side flower based on CRV-YOLO[J]. Transactions of the Chinese Society for Agricultural Machinery, 2024, 55(2): 278-286.
[5]	朱琦, 周德强, 盛卫锋, 等. 基于DSCS-YOLO的苹果表面缺陷检测方法[J]. 南京农业大学学报, 2024, 47(3): 592-601.
	ZHU Q, ZHOU D Q, SHENG W F, et al. Apple surface defect detection method based on DSCS-YOLO[J]. Journal of Nanjing Agricultural University, 2024, 47(3): 592-601.
[6]	杜娟, 崔少华, 晋美娟, 等. 改进YOLOv7的复杂道路场景目标检测算法[J]. 计算机工程与应用, 2024, 60(1): 96-103.
	DU J, CUI S H, JIN M J, et al. Improved complex road scene object detection algorithm of YOLOv7[J]. Computer Engineering and Applications, 2024, 60(1): 96-103.
[7]	宋怀波, 马宝玲, 尚钰莹, 等. 基于YOLOv7-ECA模型的苹果幼果检测[J]. 农业机械学报, 2023, 54(6): 233-242.
	SONG H B, MA B L, SHANG Y Y, et al. Detection of young apple fruits based on YOLOv7-ECA model[J]. Transactions of the Chinese Society of Agricultural Machinery, 2023, 54(6): 233-242.
[8]	洪孔林, 吴明晖, 高博, 等. 基于改进YOLOv7-tiny的茶叶嫩芽分级识别方法[J]. 茶叶科学, 2024, 44(1): 62-74.
	HONG K L, WU M H, GAO B, et al. A grading identification method for tea buds based on improved YOLOv7-tiny[J]. Journal of Tea Science, 2024, 44(1): 62-74.
[9]	WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 17-24, 2023, Vancouver, BC, Canada. IEEE, 2023: 7464-7475.
[10]	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). July 21-26, 2017, Honolulu, HI, USA. IEEE, 2017: 936-944.
[11]	LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. June 18-23, 2018. Salt Lake City, UT. IEEE, 2018: 8759-8768.
[12]	VARGHESE R, M S. YOLOv8: a novel object detection algorithm with enhanced performance and robustness[C]//2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS). April 18-19, 2024, Chennai, India. IEEE, 2024: 1-6.
[13]	CHANDGUDE P, BHAGWAT A, AUTADE M. A novel approach for k-means++approximation using Hadoop[J]. International Journal of Scientific and Research Publications, 2015, 5(12): 186-188.
[14]	ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 12993-13000.
[15]	TONG Z J, CHEN Y H, XU Z W, et al. Wise-IoU: bounding box regression loss with dynamic focusing mechanism[EB/OL]. (2023-01-24) [2025-02-10]. https://arxiv.org/abs/2301.10051.
[16]	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Computer Vision - ECCV 2018. Cham: Springer, 2018: 3-19.
[17]	HU J, SHEN L, ALBANIE S, et al. Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8): 2011-2023.
[18]	ELFWING S, UCHIBE E, DOYA K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning[EB/OL].(2017-02-10) [2025-02-10]. https://arxiv.org/abs/1702.03118.
[19]	ZHANG YF, REN WQ, ZHANG Z, et al. Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146-157.
[20]	GEVORGYAN Z, et al. Siou loss: more powerful learning for bounding box regression[EB/OL]. (2022-05-25) [2025-02-10]. https://arxiv.org/abs/2205.12740.
[21]	MA S L, YONG X. MPDIoU: a loss for efficient and accurate bounding box regression[EB/OL]. (2023-06-14) [2025-02-10]. https://arxiv.org/abs/2307.07662.
[22]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]//Computer Vision - ECCV 2016. Cham: Springer, 2016: 21-37.
[23]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[24]	BOCHKOVSKIY A, WANG C Y, LIAO H Y M, et al. YOLOv4:optimal speed and accuracy of object detection[EB/OL].(2020-04-23) [2025-02-10]. https://arxiv.org/abs/2004.10934.
[25]	MALTA A, MENDES M, FARINHA T. Augmented reality maintenance assistant using YOLOv5[J]. Applied Sciences, 2021, 11(11): 4758.
[26]	WANG C Y, YEH I H, LIAO H Y M. YOLOv9: learning what you want to learn using programmable gradient information[C]//Computer Vision - ECCV 2024. Cham: Springer, 2025: 1-21.
[27]	TIAN Y J, YE Q X, DOERMANN D. YOLOv12:attention-centric real-time object detectors[EB/OL].[2025-02-10]. https://arxiv.org/abs/2502.12524.

Lightweight and improved apple orchard fruit recognition model CS_YOLOv7

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 14

References 27

Related Articles 11

Recommended Articles

Metrics

Comments

[1]	LYU Yinchun, DUAN Enze, ZHU Yixing, ZHENG Xia, BAI Zongchun. Real-time detection of overturned meat ducks based on YOLOv8-Swin Transformer model [J]. Acta Agriculturae Zhejiangensis, 2025, 37(7): 1556-1566.
[2]	LI Mengmin, LIU Shuo, OUYANG Yu, ZHANG Peng. An efficient and lightweight citrus leaf disease detection model based on YOLOv8n [J]. Acta Agriculturae Zhejiangensis, 2025, 37(10): 2198-2208.
[3]	GUO Xiuming, WANG Dawei, LIU Shengping, ZHU Yeping, LIU Xiaohui, LIN Kejian, WANG Jiayu, LI Fei. Study on key problems for rat hole recognition and count near ground based on deep learning and its application [J]. Acta Agriculturae Zhejiangensis, 2024, 36(9): 2146-2154.
[4]	NING Wenkai, LI Jing, SHEN Xiaodong, WU Xin, LI Zhenfeng. Prediction of multi-source fusion of β-carotene during pumpkin drying [J]. Acta Agriculturae Zhejiangensis, 2023, 35(8): 1876-1887.
[5]	YAN Ning, ZHANG Han, DONG Hongtu, KANG Kai, LUO Bin. Wheat variety recognition method based on same position segmentation of transmitted light and reflected light images [J]. Acta Agriculturae Zhejiangensis, 2022, 34(3): 590-598.
[6]	BAO Xiaomin, SHENG Jiawen. Research on automatic identification and counting of insect pests on sticky board [J]. , 2019, 31(9): 1516-1522.
[7]	WU Yuanyuan, SHANG Xin, ZHANG Chengbin, XIE Xinyi. Acquisition of operation parameters of intelligent leaf vegetable harvester under natural lighting [J]. , 2017, 29(11): 1930-1937.
[8]	TIAN Hai-tao, ZHAO Jun, PU Fu-peng. A method for recognizing potato’s bud eye [J]. , 2016, 28(11): 1947-1953.
[9]	REN Lei1， ZHANG Jun2，*， LU Sheng\\|min2. Research on machine vision algorithm for automatic sorting of membrane\\|removed mandarin segments [J]. , 2015, 27(12): 2212-.
[10]	LIU Jian-jun;YAO Li-jian;PENG Zhang-lin. Detection technique for cathay hickory grade based on machine vision [J]. , 2010, 22(6): 854-858.
[11]	LIANG Kun;LUO Han-ya;SHEN Ming-xia;*;HE Rui-yin;ZHANG Lu. Review and prospect for the detection technique of paddy seeding quality [J]. , 2010, 22(2): 0-257.