Improving Effort Estimation Accuracy in Software Development Projects Using Multiple Imputation Techniques for Missing Data Handling
Article Information
Abstract
Intelligent project management systems rely on high-quality historical data for accurate automated decision-making, yet missing data in software project repositories remains a persistent challenge that degrades intelligent estimation performance. This study proposes an Intelligent Decision Support Framework (IDSF) for software development effort estimation (SDEE) that integrates Multiple Imputation (MI) as a critical data quality enhancement layer within the Analogy-Based Effort Estimation (ABEE) model. The framework is evaluated on the ISBSG dataset by systematically comparing six imputation strategies. Results demonstrate that the MI-enhanced framework achieves competitive and more stable MMRE values while fully preserving dataset integrity, in contrast to traditional deletion methods that cause substantial data loss. Additionally, a theoretical analysis of Long Short-Term Memory (LSTM) networks is provided as a prospective deep learning estimator, highlighting that high-quality restored data is structurally necessary for effective LSTM training. This work contributes to intelligent systems in software engineering by establishing MI as a robust data quality module, laying a strong foundation for building more reliable AI-driven intelligent project management systems and advancing intelligent systematics research.
Graphical Abstract
Keywords
Data Availability Statement
Funding
Conflicts of Interest
Ethical Approval and Consent to Participate
References
- Kelkar, B. A. (2022). Missing data imputation: a survey. International Journal of Decision Support System Technology (IJDSST), 14(1), 1-20.
[CrossRef] [Google Scholar] - Bardsiri, A. K., & Hashemi, S. M. (2014). Software effort estimation: a survey of well-known approaches. International Journal of Computer Science Engineering (IJCSE), 3(1), 46-50. https://www.researchgate.net/publication/328725793
[Google Scholar] - Hosni, M., & Idri, A. (2018). Software development effort estimation using feature selection techniques. In New trends in intelligent software methodologies, tools and techniques (pp. 439-452). IOS Press.
[CrossRef] [Google Scholar] - Shah, M. A., Jawawi, D. N., Isa, M. A., Wakil, K., Younas, M., & Ahmed, M. (2019). MINN: A missing data imputation technique for Analogy-Based Effort Estimation. International Journal of Advanced Computer Science and Applications, 10(2).
[CrossRef] [Google Scholar] - Idri, A., & Abnane, I. (2017, August). Fuzzy analogy based effort estimation: An empirical comparative study. In 2017 IEEE International Conference on Computer and Information Technology (CIT) (pp. 114-121). IEEE.
[CrossRef] [Google Scholar] - Song, L., Minku, L. L., & Yao, X. (2018, October). A novel automated approach for software effort estimation based on data augmentation. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (pp. 468-479).
[CrossRef] [Google Scholar] - El Bajta, M. (2015, July). Analogy-based software development effort estimation in global software development. In 2015 IEEE 10th International Conference on Global Software Engineering Workshops (pp. 51-54). IEEE.
[CrossRef] [Google Scholar] - Sharma, P., & Singh, J. (2017, December). Systematic literature review on software effort estimation using machine learning approaches. In 2017 International Conference on Next Generation Computing and Information Systems (ICNGCIS) (pp. 43-47). IEEE.
[CrossRef] [Google Scholar] - Jones, T. C. (2007). Estimating software costs. McGraw-Hill, Inc.. https://dl.acm.org/doi/abs/10.5555/1199222
[Google Scholar] - Abnane, I., & Idri, A. (2018, September). Improved analogy-based effort estimation with incomplete mixed data. In 2018 Federated Conference on Computer Science and Information Systems (FedCSIS) (pp. 1015-1024). IEEE.
[Google Scholar] - Calikli, G., & Bener, A. (2013, October). An algorithmic approach to missing data problem in modeling human aspects in software development. In Proceedings of the 9th International Conference on Predictive Models in Software Engineering (pp. 1-10).
[CrossRef] [Google Scholar] - Azzeh, M., Elsheikh, Y., & Alseid, M. (2017). An optimized analogy-based project effort estimation. arXiv preprint arXiv:1703.04563.
[CrossRef] [Google Scholar] - Shepperd, M., & Schofield, C. (1997). Estimating software project effort using analogies. IEEE Transactions on software engineering, 23(11), 736-743.
[CrossRef] [Google Scholar] - Wang, J., & Johnson, D. E. (2019). An examination of discrepancies in multiple imputation procedures between SAS® and SPSS®. The American Statistician, 73(1), 80-88.
[CrossRef] [Google Scholar] - Little, R. J., & Rubin, D. B. (2019). Statistical analysis with missing data. John Wiley & Sons.
[Google Scholar] - Cartwright, M. H., Shepperd, M. J., & Song, Q. (2004, September). Dealing with missing software project data. In Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No. 03EX717) (pp. 154-165). IEEE.
[CrossRef] [Google Scholar] - Song, Q., Shepperd, M., Chen, X., & Liu, J. (2008). Can k-NN imputation improve the performance of C4. 5 with small software project data sets? A comparative evaluation. Journal of Systems and software, 81(12), 2361-2370.
[CrossRef] [Google Scholar] - Zhu, K., Zhang, N., Shi, Y., & Wang, X. (2020). Within-project and cross-project software defect prediction based on improved transfer naive bayes algorithm. Computers, Materials, & Continua, 63(2), 891–910.
[Google Scholar] - Ardiansyah, A., Mardhia, M. M., & Handayaningsih, S. (2018). Analogy-based model for software project effort estimation. International Journal of Advances in Intelligent Informatics, 4(3), 251-260.
[CrossRef] [Google Scholar] - Bala, A., & Abran, A. (2016). Use of the multiple imputation strategy to deal with missing data in the ISBSG repository. Journal of Information Technology & Software Engineering, 6, 171. https://www.researchgate.net/publication/298338152
[Google Scholar] - Bala, A., & Abran, A. (2018). Impact analysis of multiple imputation on effort estimation models with the ISBSG repository of software projects. Softw. Meas. News, 23(1), 17-34. https://www.researchgate.net/publication/323915351
[Google Scholar] - Idri, A., Abnane, I., & Abran, A. (2016). Missing data techniques in analogy-based software development effort estimation. Journal of Systems and Software, 117, 595-611.
[CrossRef] [Google Scholar] - Pujianto, U., Wibawa, A. P., & Akbar, M. I. (2019, October). K-nearest neighbor (k-NN) based missing data imputation. In 2019 5th international conference on science in information technology (ICSITech) (pp. 83-88). IEEE.
[CrossRef] [Google Scholar] - Tamura, K., Kakimoto, T., Toda, K., Tsunoda, M., Monden, A., & Matsumoto, K. I. (2008). Empirical Evaluation of Missing Data Techniques for Effort Estimation. n3n. https://www.info.kindai.ac.jp/~tsunoda/article/328.pdf
[Google Scholar] - Read, S. (2015). Applying missing data methods to routine data using the example of a population-based register of patients with diabetes (PhD thesis). University of Edinburgh, Edinburgh, United Kingdom. http://hdl.handle.net/1842/21078
[Google Scholar] - Sentas, P., & Angelis, L. (2006). Categorical missing data imputation for software cost estimation by multinomial logistic regression. Journal of Systems and Software, 79(3), 404-414.
[CrossRef] [Google Scholar] - Zhu, X. (2014). Comparison of four methods for handing missing data in longitudinal data analysis through a simulation study. Open Journal of Statistics, 4(11), 933-944. http://dx.doi.org/10.4236/ojs.2014.411088
[Google Scholar] - González-Ladrón-de-Guevara, F., Fernández-Diego, M., & Lokan, C. (2016). The usage of ISBSG data fields in software effort estimation: A systematic mapping study. Journal of Systems and Software, 113, 188-215.
[CrossRef] [Google Scholar] - Shepperd, M., Schofield, C., & Kitchenham, B. (1996, March). Effort estimation using analogy. In Proceedings of IEEE 18th International Conference on Software Engineering (pp. 170-178). IEEE.
[CrossRef] [Google Scholar] - Jakobsen, J. C., Gluud, C., Wetterslev, J., & Winkel, P. (2017). When and how should multiple imputation be used for handling missing data in randomised clinical trials–a practical guide with flowcharts. BMC medical research methodology, 17(1), 162.
[CrossRef] [Google Scholar] - Shukla, S., & Kumar, S. (2021). An Extreme Learning Machine based Approach for Software Effort Estimation. In ENASE (pp. 47-57).
[CrossRef] [Google Scholar] - Papageorgiou, G., Grant, S. W., Takkenberg, J. J., & Mokhles, M. M. (2018). Statistical primer: how to deal with missing data in scientific research?. Interactive cardiovascular and thoracic surgery, 27(2), 153-158.
[CrossRef] [Google Scholar] - Mahdi, M. N., Mohamed Zabil, M. H., Ahmad, A. R., Ismail, R., Yusoff, Y., Cheng, L. K., ... & Happala Naidu, H. (2021). Software project management using machine learning technique—A Review. Applied Sciences, 11(11), 5183.
[CrossRef] [Google Scholar] - Fernández-Diego, M., Méndez, E. R., González-Ladrón-De-Guevara, F., Abrahão, S., & Insfran, E. (2020). An update on effort estimation in agile software development: A systematic literature review. Ieee Access, 8, 166768-166800.
[CrossRef] [Google Scholar]
Cited By (3)
-
Sahana P. Shankar, Shilpa Shashikant Chaudhari, Vinaytosh Mishra, Thompson Stephan. Intelligent techniques for predictive analytics in Agile software development.
Scientific Reports, 2026 , 16 (1).
[CrossRef] -
Yu-Kai Huang, Chih-Hung Chen, Yun-Cheng Tsai, Shun-Shii Lin. Stock Market Forecasting in Taiwan: A Radius Neighbors Regressor Approach.
Big Data and Cognitive Computing, 2026 , 10 (4).
[CrossRef] -
Yanli Chen, Tianlong Ren, Zhe Cao, Yu Zhang. Knowledge-enhanced path planning for autonomous underwater vehicle covert navigation by fusing thermocline acoustics with an improved rapidly-exploring random tree algorithm.
Engineering Applications of Artificial Intelligence, 2026 , 177 .
[CrossRef]
Cite This Article
TY - JOUR AU - Hayat, Shahida AU - Akbar, Wajahat AU - Hussain, Tariq AU - Haq, Muhammad Inam Ul AU - Hussian, Altaf AU - Khalil, Irshad AU - Khan, Muhammad Nawaz AU - Diana, Samsonova PY - 2024 DA - 2024/11/12 TI - Improving Effort Estimation Accuracy in Software Development Projects Using Multiple Imputation Techniques for Missing Data Handling JO - ICCK Transactions on Intelligent Systematics T2 - ICCK Transactions on Intelligent Systematics JF - ICCK Transactions on Intelligent Systematics VL - 1 IS - 3 SP - 190 EP - 202 DO - 10.62762/TIS.2024.751418 UR - https://www.icck.org/article/abs/TIS.2024.751418 KW - intelligent decision support KW - software development KW - intelligent project management AB - Intelligent project management systems rely on high-quality historical data for accurate automated decision-making, yet missing data in software project repositories remains a persistent challenge that degrades intelligent estimation performance. This study proposes an Intelligent Decision Support Framework (IDSF) for software development effort estimation (SDEE) that integrates Multiple Imputation (MI) as a critical data quality enhancement layer within the Analogy-Based Effort Estimation (ABEE) model. The framework is evaluated on the ISBSG dataset by systematically comparing six imputation strategies. Results demonstrate that the MI-enhanced framework achieves competitive and more stable MMRE values while fully preserving dataset integrity, in contrast to traditional deletion methods that cause substantial data loss. Additionally, a theoretical analysis of Long Short-Term Memory (LSTM) networks is provided as a prospective deep learning estimator, highlighting that high-quality restored data is structurally necessary for effective LSTM training. This work contributes to intelligent systems in software engineering by establishing MI as a robust data quality module, laying a strong foundation for building more reliable AI-driven intelligent project management systems and advancing intelligent systematics research. SN - 3068-5079 PB - Institute of Central Computation and Knowledge LA - English ER -
@article{Hayat2024Improving,
author = {Shahida Hayat and Wajahat Akbar and Tariq Hussain and Muhammad Inam Ul Haq and Altaf Hussian and Irshad Khalil and Muhammad Nawaz Khan and Samsonova Diana},
title = {Improving Effort Estimation Accuracy in Software Development Projects Using Multiple Imputation Techniques for Missing Data Handling},
journal = {ICCK Transactions on Intelligent Systematics},
year = {2024},
volume = {1},
number = {3},
pages = {190-202},
doi = {10.62762/TIS.2024.751418},
url = {https://www.icck.org/article/abs/TIS.2024.751418},
abstract = {Intelligent project management systems rely on high-quality historical data for accurate automated decision-making, yet missing data in software project repositories remains a persistent challenge that degrades intelligent estimation performance. This study proposes an Intelligent Decision Support Framework (IDSF) for software development effort estimation (SDEE) that integrates Multiple Imputation (MI) as a critical data quality enhancement layer within the Analogy-Based Effort Estimation (ABEE) model. The framework is evaluated on the ISBSG dataset by systematically comparing six imputation strategies. Results demonstrate that the MI-enhanced framework achieves competitive and more stable MMRE values while fully preserving dataset integrity, in contrast to traditional deletion methods that cause substantial data loss. Additionally, a theoretical analysis of Long Short-Term Memory (LSTM) networks is provided as a prospective deep learning estimator, highlighting that high-quality restored data is structurally necessary for effective LSTM training. This work contributes to intelligent systems in software engineering by establishing MI as a robust data quality module, laying a strong foundation for building more reliable AI-driven intelligent project management systems and advancing intelligent systematics research.},
keywords = {intelligent decision support, software development, intelligent project management},
issn = {3068-5079},
publisher = {Institute of Central Computation and Knowledge}
}
Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and Permissions
Portico