Download PDF

Comparative Analysis of Statistical and Machine Learning Models for Forecasting Maize and Walnut Production across Regions of Jammu & Kashmir, India

Author

Manish Kumar Sharma

Co-author

Nishant Jasrotia

Conference

10th International Conference on Agricultural Statistics

Format: CPS Paper - ICAS 2026

Keywords: agriculturalstatistics, predictive_modeling

Abstract

The fusion of statistical modeling and machine learning algorithms in crop modeling brings together the strengths of both approaches. Statistical models rely on assumptions like linearity, normality, and independence of errors, which often limit their effectiveness in complex, real-world agricultural scenarios. Penalized regression methods like Ridge and Lasso help mitigate issues like multicollinearity and overfitting, with Lasso additionally performing variable selection. Also Artificial Neural Networks (ANN), Time-Delayed Neural Networks (TDNN), and Support Vector Regression (SVR)—excel in capturing complex, nonlinear relationships without the need for strict assumptions. These models adaptively learn from data patterns, making them robust to outliers and more suitable for dynamic consistently outperformed traditional statistical models across crops and regions in terms of accuracy metrics like RMSE, AIC, BIC, and R². Especially in cases involving larger datasets, shifting production patterns, and non-linear interactions, machine learning models demonstrated superior forecasting ability. While statistical models contribute to interpretability and short-term forecasting, machine learning algorithms enhance predictive strength and adaptability, making them highly effective for long-term and complex agricultural prediction. This study applies statistical and machine learning models to forecast maize and walnut production across different regions. For maize, Lasso regression performs better than Ridge and OLS under statistical assumptions, with area and climatic factors as key drivers, while ANN models consistently outperform statistical approaches by capturing nonlinear relationships and delivering higher prediction accuracy, especially for large datasets. Similarly, in walnut forecasting, ARIMA and ARIMAX provide reliable short-term estimates, but their assumptions restrict performance in complex scenarios. Machine learning models, particularly ANNs and Time-Delayed Neural Networks (TDNNs), achieve superior accuracy (lower RMSE, higher R²) by adapting to dynamic, nonlinear trends, while SVR also shows competitive results. Overall, machine learning approaches outperform traditional statistical models in both crops, offering robust solutions for long-term and complex agricultural forecasting.