Maternal Health Risk Prediction in Bangladesh Using Machine Learning

Shubham Shirodkar; Raza Hasan; Salman Mahmood

doi:10.62762/JAIB.2026.495804

Article Information

Published in Journal of Artificial Intelligence in Bioinformatics

Volume/Issue Volume 2, Issue 1, 2026

Pages 1-21

Abstract

Maternal mortality risk in Bangladesh remains a critical public health challenge, compounded by rural access gaps and the absence of scalable, data-driven early-warning systems. This study presents a reproducible, interpretable machine learning framework for maternal health risk classification using an IoT-collected dataset of 1,014 patient records and six physiological indicators; a deduplication audit identified 562 repeated sensor readings, a finding which is documented in the exploratory analysis. A rigorous pipeline was implemented encompassing five clinically grounded engineered features - Mean Arterial Pressure, Shock Index, Pulse Pressure, BP Ratio, and Composite Risk Score - alongside SMOTE-based class imbalance correction applied strictly post-split to prevent data leakage. Seven classifiers were systematically evaluated across two experimental tracks: the raw six-feature dataset and the eleven-feature engineered dataset. On the raw six-feature dataset with SMOTE (training: 811 $\rightarrow$ 1{,}218 samples; test: 203 samples), Random Forest achieved the best overall performance (Accuracy: 88.2%; Macro Recall: 0.889; F1: 0.888; AUC: 0.966), confirming its suitability as the champion model. XGBoost achieved the highest AUC (0.967) with marginally lower Macro Recall (0.868). Feature importance analysis revealed Blood Sugar (28.4% MDI) and the engineered Composite Risk Score (12.2% MDI) as the two dominant predictors, validating the clinical feature engineering approach. Feature engineering benefited weaker models most (Logistic Regression $+$3.3 percentage points in Macro Recall) while the strongest tree ensembles marginally preferred the SMOTE-balanced raw feature space. An interactive Tableau dashboard translates predictive outputs into accessible visual analytics for clinical and policy decision support.

Graphical Abstract

Maternal Health Risk Prediction in Bangladesh Using Machine Learning

Keywords

maternal health risk prediction machine learning IoT healthcare data class imbalance clinical decision support

Data Availability Statement

The dataset used in this study is publicly available from the UCI Machine Learning Repository. All analysis code, serialized models, notebooks, and figures are available at https://github.com/Shub95-dot/Maternal_health_risk_Project.

Funding

This work was supported without any funding.

Conflicts of Interest

The authors declare no conflicts of interest.

AI Use Statement

The authors declare that ChatGPT-5.5 (April 2025 version, OpenAI, San Francisco, CA, USA) was used for language editing and rewriting of parts of the manuscript to improve clarity and effectiveness. The authors have carefully reviewed, revised, and verified all AI-assisted output and take full responsibility for the content of the manuscript.

Ethical Approval and Consent to Participate

Not applicable.

References

Ahsan, K. Z., Angeles, G., Curtis, S. L., Streatfield, P. K., Chakraborty, N., Rahman, M., & Jamil, K. (2024). Stagnation of maternal mortality decline in Bangladesh between 2010 and 2016 in spite of an increase in health services utilisation: Examining data from three large cross-sectional surveys. Journal of Global Health, 14, 04027.
[CrossRef] [Google Scholar]
World Health Organization. (2023). Maternal mortality: Key facts. WHO Global Health Observatory. Retrieved from https://www.who.int/news-room/fact-sheets/detail/maternal-mortality
[Google Scholar]
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
[CrossRef] [Google Scholar]
Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).
[CrossRef] [Google Scholar]
Ahmed, M., Kashem, M. A., Rahman, M., & Khatun, S. (2020, March). Review and analysis of risk factor of maternal health in remote area using the Internet of Things (IoT). In InECCE2019: Proceedings of the 5th International Conference on Electrical, Control & Computer Engineering, Kuantan, Pahang, Malaysia, 29th July 2019 (pp. 357-365). Singapore: Springer Singapore.
[CrossRef] [Google Scholar]
Inyang, U. G., Osang, F. B., Eyoh, I. J., Afolorunso, A. A., & Nwokoro, C. O. (2020). Comparative analytics of classifiers on resampled datasets for pregnancy outcome prediction. International Journal of Advanced Computer Science and Applications, 11(6), 493-503.
[CrossRef] [Google Scholar]
Kyzy, A. U., & Mekuria, R. R. (2024). Predicting Pregnancy Risk Levels Using Ensemble Machine Learning Techniques and Oversampling Methods.
[CrossRef] [Google Scholar]
Bosschieter, T. M., Xu, Z., Lan, H., Lengerich, B. J., Nori, H., Sitcov, K., ... & Caruana, R. (2022). Using interpretable machine learning to predict maternal and fetal outcomes. arXiv preprint arXiv:2207.05322.
[CrossRef] [Google Scholar]
Rahman, A., & Alam, M. G. R. (2023). Explainable AI based maternal health risk prediction using machine learning and deep learning. IEEE World AI IoT Congress (AIIoT), 13-18.
[CrossRef] [Google Scholar]
Mazumder, P. P., Hasan, R., Mahmood, S., & Palaniappan, S. (2026). Predictive analytics for maternal mortality in bangladesh: An interpretable ml framework with ensemble methods. ICCK Transactions on Machine Intelligence, 2(3), 127-143.
[CrossRef] [Google Scholar]
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.
[CrossRef] [Google Scholar]
Lundberg, S. M., & Lee, S. I. (2017). A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems, 30.
[Google Scholar]
Noor, S. T. A., Islam, R. B., Yeasar, S., & Siddique, S. (2026). Machine learning-based prediction of maternal continuum of care completion: Evidence from Bangladesh Demographic and Health Survey 2022. Array, 29, 100666.
[CrossRef] [Google Scholar]
He, H., Bai, Y., Garcia, E. A., & Li, S. (2008, June). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) (pp. 1322-1328). Ieee.
[CrossRef] [Google Scholar]
Wolpert, D. H. (1992). Stacked generalization. Neural networks, 5(2), 241-259.
[CrossRef] [Google Scholar]
Louppe, G., Wehenkel, L., Sutera, A., & Geurts, P. (2013). Understanding variable importances in forests of randomized trees. Advances in Neural Information Processing Systems, 26, 431-439.
[Google Scholar]
Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
[CrossRef] [Google Scholar]
Hassanalieragh, M., Page, A., Soyata, T., Sharma, G., Aktas, M., Mateos, G., ... & Andreescu, S. (2015, June). Health monitoring and management using Internet-of-Things (IoT) sensing with cloud-based processing: Opportunities and challenges. In 2015 IEEE international conference on services computing (pp. 285-292). IEEE.
[CrossRef] [Google Scholar]
Baker, S. B., Xiang, W., & Atkinson, I. (2017). Internet of things for smart healthcare: Technologies, challenges, and opportunities. IEEE Access, 5, 26521-26544.
[CrossRef] [Google Scholar]
Elreedy, D., & Atiya, A. F. (2019). A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class imbalance. Information sciences, 505, 32-64.
[CrossRef] [Google Scholar]
Say, L., Chou, D., Gemmill, A., Tunçalp, Ö., Moller, A. B., Daniels, J., ... & Alkema, L. (2014). Global causes of maternal death: a WHO systematic analysis. The Lancet global health, 2(6), e323-e333.
[CrossRef] [Google Scholar]
National Institute of Population Research and Training (NIPORT), & ICF. (2019). Bangladesh Maternal Mortality and Health Care Survey 2016: Final Report. NIPORT, Dhaka, Bangladesh. Retrieved from https://www.measureevaluation.org/resources/publications/tr-18-297.html
[Google Scholar]
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2018). A survey of methods for explaining black box models. ACM computing surveys (CSUR), 51(5), 1-42.
[CrossRef] [Google Scholar]
Blagus, R., & Lusa, L. (2013). SMOTE for high-dimensional class-imbalanced data. BMC bioinformatics, 14(1), 106.
[CrossRef] [Google Scholar]
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... & Liu, T. T. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30, 3146-3154.
[Google Scholar]
Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., & Elhadad, N. (2015, August). Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1721-1730).
[CrossRef] [Google Scholar]
Cox, D. R. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society Series B: Statistical Methodology, 20(2), 215-232.
[CrossRef] [Google Scholar]
Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106.
[CrossRef] [Google Scholar]
Kittler, J., Hatef, M., Duin, R. P., & Matas, J. (1998). On combining classifiers. IEEE transactions on pattern analysis and machine intelligence, 20(3), 226-239.
[CrossRef] [Google Scholar]

Cite This Article

APA Style

Shirodkar, S., Hasan, R., & Mahmood, S. (2026). Maternal Health Risk Prediction in Bangladesh Using Machine Learning. Journal of Artificial Intelligence in Bioinformatics, 2(1), 1-21. https://doi.org/10.62762/JAIB.2026.495804

Export Citation

RIS Format

Compatible with EndNote, Zotero, Mendeley, and other reference managers

TY  - JOUR
AU  - Shirodkar, Shubham
AU  - Hasan, Raza
AU  - Mahmood, Salman
PY  - 2026
DA  - 2026/06/30
TI  - Maternal Health Risk Prediction in Bangladesh Using Machine Learning
JO  - Journal of Artificial Intelligence in Bioinformatics
T2  - Journal of Artificial Intelligence in Bioinformatics
JF  - Journal of Artificial Intelligence in Bioinformatics
VL  - 2
IS  - 1
SP  - 1
EP  - 21
DO  - 10.62762/JAIB.2026.495804
UR  - https://www.icck.org/article/abs/JAIB.2026.495804
KW  - maternal health risk prediction
KW  - machine learning
KW  - IoT healthcare data
KW  - class imbalance
KW  - clinical decision support
AB  - Maternal mortality risk in Bangladesh remains a critical public health challenge, compounded by rural access gaps and the absence of scalable, data-driven early-warning systems. This study presents a reproducible, interpretable machine learning framework for maternal health risk classification using an IoT-collected dataset of 1,014 patient records and six physiological indicators; a deduplication audit identified 562 repeated sensor readings, a finding which is documented in the exploratory analysis. A rigorous pipeline was implemented encompassing five clinically grounded engineered features - Mean Arterial Pressure, Shock Index, Pulse Pressure, BP Ratio, and Composite Risk Score - alongside SMOTE-based class imbalance correction applied strictly post-split to prevent data leakage. Seven classifiers were systematically evaluated across two experimental tracks: the raw six-feature dataset and the eleven-feature engineered dataset. On the raw six-feature dataset with SMOTE (training: 811 $\rightarrow$ 1{,}218 samples; test: 203 samples), Random Forest achieved the best overall performance (Accuracy: 88.2%; Macro Recall: 0.889; F1: 0.888; AUC: 0.966), confirming its suitability as the champion model. XGBoost achieved the highest AUC (0.967) with marginally lower Macro Recall (0.868). Feature importance analysis revealed Blood Sugar (28.4% MDI) and the engineered Composite Risk Score (12.2% MDI) as the two dominant predictors, validating the clinical feature engineering approach. Feature engineering benefited weaker models most (Logistic Regression $+$3.3 percentage points in Macro Recall) while the strongest tree ensembles marginally preferred the SMOTE-balanced raw feature space. An interactive Tableau dashboard translates predictive outputs into accessible visual analytics for clinical and policy decision support.
SN  - 3068-7535
PB  - Institute of Central Computation and Knowledge
LA  - English
ER  -

BibTeX Format

Compatible with LaTeX, BibTeX, and other reference managers

@article{Shirodkar2026Maternal,
  author = {Shubham Shirodkar and Raza Hasan and Salman Mahmood},
  title = {Maternal Health Risk Prediction in Bangladesh Using Machine Learning},
  journal = {Journal of Artificial Intelligence in Bioinformatics},
  year = {2026},
  volume = {2},
  number = {1},
  pages = {1-21},
  doi = {10.62762/JAIB.2026.495804},
  url = {https://www.icck.org/article/abs/JAIB.2026.495804},
  abstract = {Maternal mortality risk in Bangladesh remains a critical public health challenge, compounded by rural access gaps and the absence of scalable, data-driven early-warning systems. This study presents a reproducible, interpretable machine learning framework for maternal health risk classification using an IoT-collected dataset of 1,014 patient records and six physiological indicators; a deduplication audit identified 562 repeated sensor readings, a finding which is documented in the exploratory analysis. A rigorous pipeline was implemented encompassing five clinically grounded engineered features - Mean Arterial Pressure, Shock Index, Pulse Pressure, BP Ratio, and Composite Risk Score - alongside SMOTE-based class imbalance correction applied strictly post-split to prevent data leakage. Seven classifiers were systematically evaluated across two experimental tracks: the raw six-feature dataset and the eleven-feature engineered dataset. On the raw six-feature dataset with SMOTE (training: 811 \$\rightarrow\$ 1{,}218 samples; test: 203 samples), Random Forest achieved the best overall performance (Accuracy: 88.2\%; Macro Recall: 0.889; F1: 0.888; AUC: 0.966), confirming its suitability as the champion model. XGBoost achieved the highest AUC (0.967) with marginally lower Macro Recall (0.868). Feature importance analysis revealed Blood Sugar (28.4\% MDI) and the engineered Composite Risk Score (12.2\% MDI) as the two dominant predictors, validating the clinical feature engineering approach. Feature engineering benefited weaker models most (Logistic Regression \$+\$3.3 percentage points in Macro Recall) while the strongest tree ensembles marginally preferred the SMOTE-balanced raw feature space. An interactive Tableau dashboard translates predictive outputs into accessible visual analytics for clinical and policy decision support.},
  keywords = {maternal health risk prediction, machine learning, IoT healthcare data, class imbalance, clinical decision support},
  issn = {3068-7535},
  publisher = {Institute of Central Computation and Knowledge}
}

Article Metrics

Citations

Crossref

0

Scopus

0

Views

42

PDF Downloads

11

Publisher's Note

ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions

Copyright © 2026 by the Author(s). Published by Institute of Central Computation and Knowledge. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

Journal of Artificial Intelligence in Bioinformatics

ISSN: 3068-7535 (Online)

[email protected]

Preserved at
Portico

User

Unlimited Downloads

Complete Library Access

Membership Eligibility

Community Leadership Opportunities