Improving Effort Estimation Accuracy in Software Development Projects Using Multiple Imputation Techniques for Missing Data Handling
Research Article  ·  Published: 12 November 2024
Issue cover
ICCK Transactions on Intelligent Systematics
Volume 1, Issue 3, 2024: 190-202
Research Article Free to Read

Improving Effort Estimation Accuracy in Software Development Projects Using Multiple Imputation Techniques for Missing Data Handling

1 Department of Computer Science, University of Peshawar, Pakistan
2 School of Electronic and Control Engineering, Chang’an University, Xián 710064, China
3 School of Computer Science and Technology, Zhejiang Gongshang University, Hangzhou 310018, China
4 School of Mathematics and Statistics, Zhejiang Gongshang University, Hangzhou 310018, China
5 Department of Computer Science and Bioinformatics, Khushal Khan Khattak University Karak, Pakistan
6 Department of Health Science and Technology, Gachon University, Incheon 21936, Republic of Korea
7 Gachon Advanced Institute for Health Sciences and Technology, Gachon University, Incheon 21936, Republic of Korea
8 Department of Computer Science and Information Technology, University of Malakand, Chakdara, Pakistan
9 School of International Education, Zhejiang Gongshang University, Hangzhou 310018, China
* Corresponding Authors: Tariq Hussain, [email protected]; Samsonova Diana, [email protected]
Volume 1, Issue 3

Article Information

Abstract

Intelligent project management systems rely on high-quality historical data for accurate automated decision-making, yet missing data in software project repositories remains a persistent challenge that degrades intelligent estimation performance. This study proposes an Intelligent Decision Support Framework (IDSF) for software development effort estimation (SDEE) that integrates Multiple Imputation (MI) as a critical data quality enhancement layer within the Analogy-Based Effort Estimation (ABEE) model. The framework is evaluated on the ISBSG dataset by systematically comparing six imputation strategies. Results demonstrate that the MI-enhanced framework achieves competitive and more stable MMRE values while fully preserving dataset integrity, in contrast to traditional deletion methods that cause substantial data loss. Additionally, a theoretical analysis of Long Short-Term Memory (LSTM) networks is provided as a prospective deep learning estimator, highlighting that high-quality restored data is structurally necessary for effective LSTM training. This work contributes to intelligent systems in software engineering by establishing MI as a robust data quality module, laying a strong foundation for building more reliable AI-driven intelligent project management systems and advancing intelligent systematics research.

Graphical Abstract

Improving Effort Estimation Accuracy in Software Development Projects Using Multiple Imputation Techniques for Missing Data Handling

Keywords

intelligent decision support software development intelligent project management

Data Availability Statement

Data will be made available on request.

Funding

This work was supported without any funding.

Conflicts of Interest

The authors declare no conflicts of interest.

Ethical Approval and Consent to Participate

Not applicable.

References

  1. Kelkar, B. A. (2022). Missing data imputation: a survey. International Journal of Decision Support System Technology (IJDSST), 14(1), 1-20.
    [CrossRef] [Google Scholar]
  2. Bardsiri, A. K., & Hashemi, S. M. (2014). Software effort estimation: a survey of well-known approaches. International Journal of Computer Science Engineering (IJCSE), 3(1), 46-50. https://www.researchgate.net/publication/328725793
    [Google Scholar]
  3. Hosni, M., & Idri, A. (2018). Software development effort estimation using feature selection techniques. In New trends in intelligent software methodologies, tools and techniques (pp. 439-452). IOS Press.
    [CrossRef] [Google Scholar]
  4. Shah, M. A., Jawawi, D. N., Isa, M. A., Wakil, K., Younas, M., & Ahmed, M. (2019). MINN: A missing data imputation technique for Analogy-Based Effort Estimation. International Journal of Advanced Computer Science and Applications, 10(2).
    [CrossRef] [Google Scholar]
  5. Idri, A., & Abnane, I. (2017, August). Fuzzy analogy based effort estimation: An empirical comparative study. In 2017 IEEE International Conference on Computer and Information Technology (CIT) (pp. 114-121). IEEE.
    [CrossRef] [Google Scholar]
  6. Song, L., Minku, L. L., & Yao, X. (2018, October). A novel automated approach for software effort estimation based on data augmentation. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (pp. 468-479).
    [CrossRef] [Google Scholar]
  7. El Bajta, M. (2015, July). Analogy-based software development effort estimation in global software development. In 2015 IEEE 10th International Conference on Global Software Engineering Workshops (pp. 51-54). IEEE.
    [CrossRef] [Google Scholar]
  8. Sharma, P., & Singh, J. (2017, December). Systematic literature review on software effort estimation using machine learning approaches. In 2017 International Conference on Next Generation Computing and Information Systems (ICNGCIS) (pp. 43-47). IEEE.
    [CrossRef] [Google Scholar]
  9. Jones, T. C. (2007). Estimating software costs. McGraw-Hill, Inc.. https://dl.acm.org/doi/abs/10.5555/1199222
    [Google Scholar]
  10. Abnane, I., & Idri, A. (2018, September). Improved analogy-based effort estimation with incomplete mixed data. In 2018 Federated Conference on Computer Science and Information Systems (FedCSIS) (pp. 1015-1024). IEEE.
    [Google Scholar]
  11. Calikli, G., & Bener, A. (2013, October). An algorithmic approach to missing data problem in modeling human aspects in software development. In Proceedings of the 9th International Conference on Predictive Models in Software Engineering (pp. 1-10).
    [CrossRef] [Google Scholar]
  12. Azzeh, M., Elsheikh, Y., & Alseid, M. (2017). An optimized analogy-based project effort estimation. arXiv preprint arXiv:1703.04563.
    [CrossRef] [Google Scholar]
  13. Shepperd, M., & Schofield, C. (1997). Estimating software project effort using analogies. IEEE Transactions on software engineering, 23(11), 736-743.
    [CrossRef] [Google Scholar]
  14. Wang, J., & Johnson, D. E. (2019). An examination of discrepancies in multiple imputation procedures between SAS® and SPSS®. The American Statistician, 73(1), 80-88.
    [CrossRef] [Google Scholar]
  15. Little, R. J., & Rubin, D. B. (2019). Statistical analysis with missing data. John Wiley & Sons.
    [Google Scholar]
  16. Cartwright, M. H., Shepperd, M. J., & Song, Q. (2004, September). Dealing with missing software project data. In Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No. 03EX717) (pp. 154-165). IEEE.
    [CrossRef] [Google Scholar]
  17. Song, Q., Shepperd, M., Chen, X., & Liu, J. (2008). Can k-NN imputation improve the performance of C4. 5 with small software project data sets? A comparative evaluation. Journal of Systems and software, 81(12), 2361-2370.
    [CrossRef] [Google Scholar]
  18. Zhu, K., Zhang, N., Shi, Y., & Wang, X. (2020). Within-project and cross-project software defect prediction based on improved transfer naive bayes algorithm. Computers, Materials, & Continua, 63(2), 891–910.
    [Google Scholar]
  19. Ardiansyah, A., Mardhia, M. M., & Handayaningsih, S. (2018). Analogy-based model for software project effort estimation. International Journal of Advances in Intelligent Informatics, 4(3), 251-260.
    [CrossRef] [Google Scholar]
  20. Bala, A., & Abran, A. (2016). Use of the multiple imputation strategy to deal with missing data in the ISBSG repository. Journal of Information Technology & Software Engineering, 6, 171. https://www.researchgate.net/publication/298338152
    [Google Scholar]
  21. Bala, A., & Abran, A. (2018). Impact analysis of multiple imputation on effort estimation models with the ISBSG repository of software projects. Softw. Meas. News, 23(1), 17-34. https://www.researchgate.net/publication/323915351
    [Google Scholar]
  22. Idri, A., Abnane, I., & Abran, A. (2016). Missing data techniques in analogy-based software development effort estimation. Journal of Systems and Software, 117, 595-611.
    [CrossRef] [Google Scholar]
  23. Pujianto, U., Wibawa, A. P., & Akbar, M. I. (2019, October). K-nearest neighbor (k-NN) based missing data imputation. In 2019 5th international conference on science in information technology (ICSITech) (pp. 83-88). IEEE.
    [CrossRef] [Google Scholar]
  24. Tamura, K., Kakimoto, T., Toda, K., Tsunoda, M., Monden, A., & Matsumoto, K. I. (2008). Empirical Evaluation of Missing Data Techniques for Effort Estimation. n3n. https://www.info.kindai.ac.jp/~tsunoda/article/328.pdf
    [Google Scholar]
  25. Read, S. (2015). Applying missing data methods to routine data using the example of a population-based register of patients with diabetes (PhD thesis). University of Edinburgh, Edinburgh, United Kingdom. http://hdl.handle.net/1842/21078
    [Google Scholar]
  26. Sentas, P., & Angelis, L. (2006). Categorical missing data imputation for software cost estimation by multinomial logistic regression. Journal of Systems and Software, 79(3), 404-414.
    [CrossRef] [Google Scholar]
  27. Zhu, X. (2014). Comparison of four methods for handing missing data in longitudinal data analysis through a simulation study. Open Journal of Statistics, 4(11), 933-944. http://dx.doi.org/10.4236/ojs.2014.411088
    [Google Scholar]
  28. González-Ladrón-de-Guevara, F., Fernández-Diego, M., & Lokan, C. (2016). The usage of ISBSG data fields in software effort estimation: A systematic mapping study. Journal of Systems and Software, 113, 188-215.
    [CrossRef] [Google Scholar]
  29. Shepperd, M., Schofield, C., & Kitchenham, B. (1996, March). Effort estimation using analogy. In Proceedings of IEEE 18th International Conference on Software Engineering (pp. 170-178). IEEE.
    [CrossRef] [Google Scholar]
  30. Jakobsen, J. C., Gluud, C., Wetterslev, J., & Winkel, P. (2017). When and how should multiple imputation be used for handling missing data in randomised clinical trials–a practical guide with flowcharts. BMC medical research methodology, 17(1), 162.
    [CrossRef] [Google Scholar]
  31. Shukla, S., & Kumar, S. (2021). An Extreme Learning Machine based Approach for Software Effort Estimation. In ENASE (pp. 47-57).
    [CrossRef] [Google Scholar]
  32. Papageorgiou, G., Grant, S. W., Takkenberg, J. J., & Mokhles, M. M. (2018). Statistical primer: how to deal with missing data in scientific research?. Interactive cardiovascular and thoracic surgery, 27(2), 153-158.
    [CrossRef] [Google Scholar]
  33. Mahdi, M. N., Mohamed Zabil, M. H., Ahmad, A. R., Ismail, R., Yusoff, Y., Cheng, L. K., ... & Happala Naidu, H. (2021). Software project management using machine learning technique—A Review. Applied Sciences, 11(11), 5183.
    [CrossRef] [Google Scholar]
  34. Fernández-Diego, M., Méndez, E. R., González-Ladrón-De-Guevara, F., Abrahão, S., & Insfran, E. (2020). An update on effort estimation in agile software development: A systematic literature review. Ieee Access, 8, 166768-166800.
    [CrossRef] [Google Scholar]

Cited By (3)

  1. Sahana P. Shankar, Shilpa Shashikant Chaudhari, Vinaytosh Mishra, Thompson Stephan. Intelligent techniques for predictive analytics in Agile software development. Scientific Reports, 2026 , 16 (1).
    [CrossRef]
  2. Yu-Kai Huang, Chih-Hung Chen, Yun-Cheng Tsai, Shun-Shii Lin. Stock Market Forecasting in Taiwan: A Radius Neighbors Regressor Approach. Big Data and Cognitive Computing, 2026 , 10 (4).
    [CrossRef]
  3. Yanli Chen, Tianlong Ren, Zhe Cao, Yu Zhang. Knowledge-enhanced path planning for autonomous underwater vehicle covert navigation by fusing thermocline acoustics with an improved rapidly-exploring random tree algorithm. Engineering Applications of Artificial Intelligence, 2026 , 177 .
    [CrossRef]
* Citation data provided by Crossref Cited-by.

Cite This Article

APA Style
Hayat, S., Akbar, W., Hussain, T., Haq, M. I. U., Hussian, A., Khalil, I., Khan, M. N., & Diana, S. (2024). Improving Effort Estimation Accuracy in Software Development Projects Using Multiple Imputation Techniques for Missing Data Handling. ICCK Transactions on Intelligent Systematics, 1(3), 190-202. https://doi.org/10.62762/TIS.2024.751418
Export Citation
RIS Format
Compatible with EndNote, Zotero, Mendeley, and other reference managers
TY  - JOUR
AU  - Hayat, Shahida
AU  - Akbar, Wajahat
AU  - Hussain, Tariq
AU  - Haq, Muhammad Inam Ul
AU  - Hussian, Altaf
AU  - Khalil, Irshad
AU  - Khan, Muhammad Nawaz
AU  - Diana, Samsonova
PY  - 2024
DA  - 2024/11/12
TI  - Improving Effort Estimation Accuracy in Software Development Projects Using Multiple Imputation Techniques for Missing Data Handling
JO  - ICCK Transactions on Intelligent Systematics
T2  - ICCK Transactions on Intelligent Systematics
JF  - ICCK Transactions on Intelligent Systematics
VL  - 1
IS  - 3
SP  - 190
EP  - 202
DO  - 10.62762/TIS.2024.751418
UR  - https://www.icck.org/article/abs/TIS.2024.751418
KW  - intelligent decision support
KW  - software development
KW  - intelligent project management
AB  - Intelligent project management systems rely on high-quality historical data for accurate automated decision-making, yet missing data in software project repositories remains a persistent challenge that degrades intelligent estimation performance. This study proposes an Intelligent Decision Support Framework (IDSF) for software development effort estimation (SDEE) that integrates Multiple Imputation (MI) as a critical data quality enhancement layer within the Analogy-Based Effort Estimation (ABEE) model. The framework is evaluated on the ISBSG dataset by systematically comparing six imputation strategies. Results demonstrate that the MI-enhanced framework achieves competitive and more stable MMRE values while fully preserving dataset integrity, in contrast to traditional deletion methods that cause substantial data loss. Additionally, a theoretical analysis of Long Short-Term Memory (LSTM) networks is provided as a prospective deep learning estimator, highlighting that high-quality restored data is structurally necessary for effective LSTM training. This work contributes to intelligent systems in software engineering by establishing MI as a robust data quality module, laying a strong foundation for building more reliable AI-driven intelligent project management systems and advancing intelligent systematics research.
SN  - 3068-5079
PB  - Institute of Central Computation and Knowledge
LA  - English
ER  - 
BibTeX Format
Compatible with LaTeX, BibTeX, and other reference managers
@article{Hayat2024Improving,
  author = {Shahida Hayat and Wajahat Akbar and Tariq Hussain and Muhammad Inam Ul Haq and Altaf Hussian and Irshad Khalil and Muhammad Nawaz Khan and Samsonova Diana},
  title = {Improving Effort Estimation Accuracy in Software Development Projects Using Multiple Imputation Techniques for Missing Data Handling},
  journal = {ICCK Transactions on Intelligent Systematics},
  year = {2024},
  volume = {1},
  number = {3},
  pages = {190-202},
  doi = {10.62762/TIS.2024.751418},
  url = {https://www.icck.org/article/abs/TIS.2024.751418},
  abstract = {Intelligent project management systems rely on high-quality historical data for accurate automated decision-making, yet missing data in software project repositories remains a persistent challenge that degrades intelligent estimation performance. This study proposes an Intelligent Decision Support Framework (IDSF) for software development effort estimation (SDEE) that integrates Multiple Imputation (MI) as a critical data quality enhancement layer within the Analogy-Based Effort Estimation (ABEE) model. The framework is evaluated on the ISBSG dataset by systematically comparing six imputation strategies. Results demonstrate that the MI-enhanced framework achieves competitive and more stable MMRE values while fully preserving dataset integrity, in contrast to traditional deletion methods that cause substantial data loss. Additionally, a theoretical analysis of Long Short-Term Memory (LSTM) networks is provided as a prospective deep learning estimator, highlighting that high-quality restored data is structurally necessary for effective LSTM training. This work contributes to intelligent systems in software engineering by establishing MI as a robust data quality module, laying a strong foundation for building more reliable AI-driven intelligent project management systems and advancing intelligent systematics research.},
  keywords = {intelligent decision support, software development, intelligent project management},
  issn = {3068-5079},
  publisher = {Institute of Central Computation and Knowledge}
}

Article Metrics

Citations
Views
3778
PDF Downloads
717

Publisher's Note

ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions

Institute of Central Computation and Knowledge (ICCK) or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
ICCK Transactions on Intelligent Systematics
ICCK Transactions on Intelligent Systematics
ISSN: 3068-5079 (Online) | ISSN: 3069-003X (Print)
Portico
Preserved at
Portico