Estimation of eggplant yield with machine learning methods using spectral vegetation indices

Taşan S., Cemek B., Tasan M., Canturk A.

COMPUTERS AND ELECTRONICS IN AGRICULTURE, vol.202, 2022 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 202
  • Publication Date: 2022
  • Doi Number: 10.1016/j.compag.2022.107367
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Aerospace Database, Applied Science & Technology Source, BIOSIS, CAB Abstracts, Communication Abstracts, Computer & Applied Sciences, Environment Index, Food Science & Technology Abstracts, INSPEC, Metadex, Veterinary Science Database, Civil Engineering Abstracts
  • Keywords: Crop yield prediction, Eggplant, Machine learning, Spectral vegetation indices, Remote sensing, CROP YIELD, NEURAL-NETWORKS, REFERENCE EVAPOTRANSPIRATION, LAI ESTIMATION, TIME-SERIES, WHEAT YIELD, PREDICTION, WATER, REFLECTANCE, IRRIGATION
  • Ondokuz Mayıs University Affiliated: Yes


Estimation of crop yields included in the planning is an essential condition for accurate and timely agricultural planning. Remotely sensed products, such as the spectral vegetation index (VI), are widely used in estimation of crop yields. The integration of remotely sensed data into machine learning methods will have the potential to develop a real-time management system specific to the area of interest. The main aim of the study was to determine the eggplant yield in field conditions, based on VIs obtained from a handheld spectroradiometer, using five different machine learning methods (artificial neural networks (ANN), support vector machines (SVR), k nearest neighbor (kNN), random forests (RF), and Adaptive boosting (AB)), and compare the performances of the methods. The data used in the study were obtained in field experiments focusing on determining the most suitable irrigation program for eggplant production in a semi-humid climate region in northern Turkey during 2015, 2016 and 2017 growing seasons. Irrigation treatments consisted of a total of five applications, which were full water application (I1:100 %) and different deficit ration of full water application (I2:I1x 75 %, I3: I1x50%, I4: I1x25% and I5: rainfed based). Input variables used in yield estimation models were determined by correlation analysis and principal components analysis (PCA). The inputs in the models were different combinations of 10 different VIs, the number of days after planting (DAP) and water application coefficients. In addition, an alternative approach was proposed, in which PCA components were used as input for yield estimation. All machine learning models using PCA-based inputs were estimated with higher accuracy than other input combinations. The best results were obtained with the ANN model based on PCA-based inputs; therefore, this model was chosen for eggplant yield estimation (coefficient of determination (R-2) = 0.973, mean absolute error (MAE) = 274.816 kg ha(-1), root mean square error (RMSE) = 352.787 kg ha(-1) and Nash-Sutcliffe efficiency (NSE) = 0.951). The lowest accuracy for yield estimation was recorded in RF model. The prediction accuracy of the models using a single VI as input was low. Green index (GI) and green vegetation index (GVI) had the highest impact on eggplant yield, and eggplant yield was estimated with higher accuracy with these indices, which are sensitive to chlorophyll absorption. The findings of the current study demonstrate the benefits of using remotely sensed data and PCA together in machine learning models to more reliably and accurately estimate eggplant yield at regional scale.