Application of Machine Learning for Effective Screening of Enhanced Oil Recovery Methods
Article Information
Abstract
Selecting the most suitable enhanced oil recovery (EOR) technique remains challenging due to severe class imbalance in historical datasets and the limitations of traditional screening criteria. To address data imbalance while preserving domain knowledge, this study proposes a novel machine learning framework that incorporates domain-informed synthetic data generation strictly constrained by established EOR screening criteria. An initial dataset of 583 documented EOR projects was compiled from field reports and public databases. After rigorous cleaning, 575 valid samples were retained and subsequently augmented to 760 balanced instances (class sizes ranging from 60–110 samples per class). This reduced the imbalance ratio from 123:1 to approximately 1.8:1. The augmented dataset was processed using principal component analysis (PCA) for dimensionality reduction, followed by hyperparameter tuning and 5-fold cross-validation. Among the evaluated models, K-Nearest Neighbors (KNN) and Random Forest achieved the highest macro-averaged performance (F1-score of 0.89 and 0.85, respectively). The results demonstrate that domain-guided synthetic data generation significantly improves model accuracy and robustness for multi-class EOR screening, offering reservoir engineers a reliable, machine learning-supported decision-making tool.
Graphical Abstract
Keywords
Data Availability Statement
Funding
Conflicts of Interest
AI Use Statement
Ethical Approval and Consent to Participate
References
- Aladasani, A., & Bai, B. (2010, June). Recent developments and updated screening criteria of enhanced oil recovery techniques. In SPE International Oil and Gas Conference and Exhibition in China (pp. SPE-130726). Spe.
[CrossRef] [Google Scholar] - Cheraghi, Y., Kord, S., & Mashayekhizadeh, V. (2021). Application of machine learning techniques for selecting the most suitable enhanced oil recovery method; challenges and opportunities. Journal of Petroleum Science and Engineering, 205, 108761.
[CrossRef] [Google Scholar] - Sathya, R., & Abraham, A. (2013). Comparison of supervised and unsupervised learning algorithms for pattern classification. International Journal of Advanced Research in Artificial Intelligence, 2(2), 34-38.
[Google Scholar] - Alvarado, V., Ranson, A., Hernandez, K., Manrique, E., Matheus, J., Liscano, T., & Prosperi, N. (2002, October). Selection of EOR/IOR opportunities based on machine learning. In SPE Europec featured at EAGE Conference and Exhibition? (pp. SPE-78332). SPE.
[CrossRef] [Google Scholar] - Wong, T. T., & Yeh, P. Y. (2019). Reliable accuracy estimates from k-fold cross validation. IEEE Transactions on Knowledge and Data Engineering, 32(8), 1586-1594.
[CrossRef] [Google Scholar] - Al Adasani, A., & Bai, B. (2011). Analysis of EOR projects and updated screening criteria. Journal of Petroleum Science and Engineering, 79(1-2), 10-24.
[CrossRef] [Google Scholar] - Oil & Gas Journal. (1998, April 20). 1998 worldwide EOR survey [Industry survey]. Retrieved from https://www.ogj.com/home/article/17226236/1998-worldwide-eor-survey
[Google Scholar] - Taber, J. J., Martin, F. D., & Seright, R. S. (1997). EOR screening criteria revisited Part 1: Introduction to screening criteria and enhanced recovery field projects. SPE Reservoir Engineering, 12(3), 189-198.
[CrossRef] [Google Scholar] - Mohammed, R., Rawashdeh, J., & Abdullah, M. (2020, April). Machine learning with oversampling and undersampling techniques: overview study and experimental results. In 2020 11th international conference on information and communication systems (ICICS) (pp. 243-248). IEEE.
[CrossRef] [Google Scholar] - Provost, F. (2000, July). Machine learning from imbalanced data sets 101. In Proceedings of the AAAI’2000 workshop on imbalanced data sets (Vol. 68, No. 2000, pp. 1-3). AAAI Press.
[Google Scholar] - Lohr, S. L. (2021). Sampling: design and analysis. Chapman and Hall/CRC.
[CrossRef] [Google Scholar] - May, R. J., Maier, H. R., & Dandy, G. C. (2010). Data splitting for artificial neural networks using SOM-based stratified sampling. Neural Networks, 23(2), 283-294.
[CrossRef] [Google Scholar] - Theng, D., & Bhoyar, K. K. (2024). Feature selection techniques for machine learning: a survey of more than two decades of research. Knowledge and Information Systems, 66(3), 1575-1637.
[CrossRef] [Google Scholar] - Hartono, A. D., Hakiki, F., Syihab, Z., Ambia, F., Yasutra, A., Sutopo, S., ... & Apriandi, R. (2017, October). Revisiting EOR projects in Indonesia through integrated study: EOR screening, predictive model, and optimisation. In SPE Asia Pacific Oil and Gas Conference and Exhibition (p. D012S036R029). SPE.
[CrossRef] [Google Scholar] - Khazali, N., Sharifi, M., & Ahmadi, M. A. (2019). Application of fuzzy decision tree in EOR screening assessment. Journal of Petroleum Science and Engineering, 177, 167-180.
[CrossRef] [Google Scholar] - Reddy, G. T., Reddy, M. P. K., Lakshmanna, K., Kaluri, R., Rajput, D. S., Srivastava, G., & Baker, T. (2020). Analysis of dimensionality reduction techniques on big data. IEEE Access, 8, 54776-54788.
[CrossRef] [Google Scholar] - Feurer, M., & Hutter, F. (2019). Hyperparameter optimization. In Automated machine learning: Methods, systems, challenges (pp. 3-33). Cham: Springer International Publishing.
[CrossRef] [Google Scholar] - Frederick, L. (2005). Implementation of Breiman's Random Forest Machine Learning Algorithm. ECE591Q Machine Learning Journal Paper, 1-13.
[Google Scholar] - Parada, C. H., & Ertekin, T. (2012, March). A new screening tool for improved oil recovery methods using artificial neural networks. In SPE western regional meeting (pp. SPE-153321). SPE.
[CrossRef] [Google Scholar] - Sorzano, C. O. S., Vargas, J., & Montano, A. P. (2014). A survey of dimensionality reduction techniques. arXiv preprint arXiv:1403.2877.
[Google Scholar] - Tanha, J., Abdi, Y., Samadi, N., Razzaghi, N., & Asadpour, M. (2020). Boosting methods for multi-class imbalanced data classification: an experimental review. Journal of Big data, 7(1), 70.
[CrossRef] [Google Scholar] - Tarrahi, M., Afra, S., & Surovets, I. (2015, October). A novel automated and probabilistic EOR screening method to integrate theoretical screening criteria and real field EOR practices using machine learning algorithms. In SPE Russian Petroleum Technology Conference (pp. SPE-176725). SPE.
[CrossRef] [Google Scholar] - Yang, L., & Shami, A. (2020). On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415, 295-316.
[CrossRef] [Google Scholar] - Zhang, N., Wei, M., Fan, J., Aldhaheri, M., Zhang, Y., & Bai, B. (2019). Development of a hybrid scoring system for EOR screening by combining conventional screening guidelines and random forest algorithm. Fuel, 256, 115915.
[CrossRef] [Google Scholar]
Cited By (29)
-
Tong Lei, Junfeng Liu, Rongqi Yang, Yu Chen, Tianjun Zhang, Zhongliang Zhao. A Hybrid PNN–XGBoost Framework for Gas–Water Flow Pattern Prediction and 3D Visualization in Near-Horizontal Wells.
Processes, 2026 , 14 (7).
[CrossRef] -
Rui Gao, Chenxi Zhang, Weichen Gao, Guorui Feng, Xiao Huang, Xueming Zhang, Hong Guan. Fracture Mechanics and Strata Pressure Responses in Underground Mining Excavations Induced by Prefabricated Cracks.
Geosciences, 2026 , 16 (5).
[CrossRef] -
Tianbiao Zhao. A mini review on “from decades to centuries”: temporal gaps in CCUS governance.
Frontiers in Earth Science, 2026 , 14 .
[CrossRef] -
Wenhao Yang, Ang Li, Liyan Zhang, Xiaoyao Qin. Deep Learning-Based Intelligent Analysis of Rock Thin Sections: From Cross-Scale Lithology Classification to Grain Segmentation for Quantitative Fabric Characterization.
Electronics, 2026 , 15 (7).
[CrossRef] -
Arturo Alvarez-Cruz, Estela Mayoral-Villa, Alfonso Ramón García-Márquez, Jaime Klapp. Rational Design of High-Performance Viscosifying Polymers in Confined Systems via a Machine-Learning-Accelerated Multiscale Framework for Enhanced Hydrocarbon Recovery.
Fluids, 2026 , 11 (4).
[CrossRef] -
Fatima Sapundzhi, Slavi Georgiev, Ivan Georgiev, Venelin Todorov. Forecasting Solar Energy Production Through Modeling of Photovoltaic System Data for Sustainable Energy Planning.
Applied Sciences, 2026 , 16 (10).
[CrossRef] -
Prabhat Baral, Sagar Ranabhat, Laxman Bhujel, Md. Tamanna Aalam, Saroj Subedi, Tek Raj Gyawali, Li Qingchao. Use of Madhuca longifolia (Mahua) Bark‐Derived Fibers for Enhancing the Performance of Cement Mortar and Concrete.
Advances in Civil Engineering, 2026 , 2026 (1).
[CrossRef] -
Bakytzhan Kaliyev, Beibit Myrzakhmetov, Bulbul Mauletbekova, Bibinur Akhymbayeva, Gulzada Mashatayeva, Yerik Merkibayev, Vladimir I. Golik, Boris V. Malozyomov. Energy-Efficient and Reliable Hydrodynamic Separation of Spent Drilling Fluids: Experiments, Modeling, and Process Stability.
Energies, 2026 , 19 (7).
[CrossRef] -
Eder Arley Leon-Gomez, Víctor Elvira, Jorge Iván Montes-Monsalve, Andrés Marino Álvarez-Meza, Alvaro Orozco-Gutierrez, German Castellanos-Dominguez. Multi-Scale Spectral Recurrent Network Based on Random Fourier Features for Wind Speed Forecasting.
Technologies, 2026 , 14 (4).
[CrossRef] -
Qiang Gao, Yun Bai, Shuaizhi Ji, Junying Zhang, Shitian Wan, Hongxia He, Feng Huang, Junling Lou, Qiang Li. Wellbore Stability Analysis of Shale Formation Considering Sealing Effect of Mud Cake on Drilling Fluid Seepage.
Processes, 2026 , 14 (6).
[CrossRef] -
Moataz Barakat, Dhyaa H. Haddad, Nader H. El-Gendy, Abdelmoniem Raef, Ahmed A. Badr, Mohamed Reda. Integrated 3D Reservoir Characterization of the Mesozoic–Cenozoic Succession in the Northern Hinge Zone: Insights from the Abu Gharadig Basin, Western Desert, Egypt.
Energies, 2026 , 19 (9).
[CrossRef] -
Grachik Eremyan, Adel M. Magdeev, Mohammed Al-Shargabi, Ivan V. Matveev, Ivan E. Smirnov, Gleb Y. Shishaev, Shadfar Davoodi. Enhancing Reservoir Simulation History Matching Using SHAP Value for Parameter Range Selection.
Energies, 2026 , 19 (10).
[CrossRef] -
Wanliang Zhang, Fei Mo, Qing Wan, Zhilin Qi, Qiushan Liu, Hongbin Liang, Ping Yue. Fluids distribution mechanism in illite pores of deep shale after imbibition of fracturing fluid: a molecular dynamics study.
Petroleum Science and Technology, 2026 .
[CrossRef] -
Xiaoli Zhou, Jiakun Dong, Buxu Sun, Ziyi Yang, Xiaoping Sun, Yu Shen. Slope-Controlled Partitioning of Vertical and Lateral Solute Transport Pathways Revealed by Inclined Leaching Experiments.
Water, 2026 , 18 (6).
[CrossRef] -
Lei Wang, Zhiqiang Hu, Lilin Li, Zhenxiang Zhang, Liang Tao. Numerical Investigation of Dynamics and Particle Transport in Gas–Liquid–Solid Three-Phase Multi-Source Converging Flows.
Fluids, 2026 , 11 (6).
[CrossRef] -
Yufeng Shen, Yu Song, Jian Yi, Wentong He, Xuanlong Shan, Ang Li, Ying Bian, Nan Jiang, Shuyang Wang, Yongbo Zhang. Research Progress in Engineering Technology and Related Fields of Oil Shale In Situ Conversion Triggered by the Topochemical Reaction Method.
Processes, 2026 , 14 (11).
[CrossRef] -
Xingping Yin, Yuqiang Jiang, Yifan Gu, Yuegang Li, Zhanlei Wang, Xiugen Fu. Hydrocarbon Generation and Pore Evolution of Marine Shale from the Longmaxi Formation, NE Sichuan Basin, China.
Geosciences, 2026 , 16 (4).
[CrossRef] -
Zhiheng Shen, Yumei Li, Xinrui Li, Haoyuan Zheng, Yan Xi, Liwei Yu. Optimization Design of Interfacial Integrity for Composite Plugging Barriers in Carbon Sequestration Wells.
Processes, 2026 , 14 (8).
[CrossRef] -
Xueye Cao, Eryuan Zhang, Zhonghui Li, Jielong Sun, Mingming Qiu, Li Qingchao. Unified Triple‐Shear Solution for Unsaturated Soil Pressure on Backfill Surface Inclination Under Rainfall Infiltration.
Advances in Civil Engineering, 2026 , 2026 (1).
[CrossRef] -
Shunsuke Nakaya, Jun Matsushima. Oil Production, Net Energy, and Capital Dynamics: A System-Coupled Lotka–Volterra Approach.
Energies, 2026 , 19 (7).
[CrossRef]
Cite This Article
TY - JOUR AU - Ali, Jawad AU - Ansari, Ubedullah AU - Ali, Fateh AU - Javed, Tariq AU - Hullio, Imran Ahmed PY - 2026 DA - 2026/02/27 TI - Application of Machine Learning for Effective Screening of Enhanced Oil Recovery Methods JO - Reservoir Science T2 - Reservoir Science JF - Reservoir Science VL - 2 IS - 1 SP - 65 EP - 80 DO - 10.62762/RS.2025.333184 UR - https://www.icck.org/article/abs/RS.2025.333184 KW - EOR screening KW - machine learning KW - screening criteria KW - imbalanced data KW - multi-class classification KW - enhanced oil recovery AB - Selecting the most suitable enhanced oil recovery (EOR) technique remains challenging due to severe class imbalance in historical datasets and the limitations of traditional screening criteria. To address data imbalance while preserving domain knowledge, this study proposes a novel machine learning framework that incorporates domain-informed synthetic data generation strictly constrained by established EOR screening criteria. An initial dataset of 583 documented EOR projects was compiled from field reports and public databases. After rigorous cleaning, 575 valid samples were retained and subsequently augmented to 760 balanced instances (class sizes ranging from 60–110 samples per class). This reduced the imbalance ratio from 123:1 to approximately 1.8:1. The augmented dataset was processed using principal component analysis (PCA) for dimensionality reduction, followed by hyperparameter tuning and 5-fold cross-validation. Among the evaluated models, K-Nearest Neighbors (KNN) and Random Forest achieved the highest macro-averaged performance (F1-score of 0.89 and 0.85, respectively). The results demonstrate that domain-guided synthetic data generation significantly improves model accuracy and robustness for multi-class EOR screening, offering reservoir engineers a reliable, machine learning-supported decision-making tool. SN - 3070-2356 PB - Institute of Central Computation and Knowledge LA - English ER -
@article{Ali2026Applicatio,
author = {Jawad Ali and Ubedullah Ansari and Fateh Ali and Tariq Javed and Imran Ahmed Hullio},
title = {Application of Machine Learning for Effective Screening of Enhanced Oil Recovery Methods},
journal = {Reservoir Science},
year = {2026},
volume = {2},
number = {1},
pages = {65-80},
doi = {10.62762/RS.2025.333184},
url = {https://www.icck.org/article/abs/RS.2025.333184},
abstract = {Selecting the most suitable enhanced oil recovery (EOR) technique remains challenging due to severe class imbalance in historical datasets and the limitations of traditional screening criteria. To address data imbalance while preserving domain knowledge, this study proposes a novel machine learning framework that incorporates domain-informed synthetic data generation strictly constrained by established EOR screening criteria. An initial dataset of 583 documented EOR projects was compiled from field reports and public databases. After rigorous cleaning, 575 valid samples were retained and subsequently augmented to 760 balanced instances (class sizes ranging from 60–110 samples per class). This reduced the imbalance ratio from 123:1 to approximately 1.8:1. The augmented dataset was processed using principal component analysis (PCA) for dimensionality reduction, followed by hyperparameter tuning and 5-fold cross-validation. Among the evaluated models, K-Nearest Neighbors (KNN) and Random Forest achieved the highest macro-averaged performance (F1-score of 0.89 and 0.85, respectively). The results demonstrate that domain-guided synthetic data generation significantly improves model accuracy and robustness for multi-class EOR screening, offering reservoir engineers a reliable, machine learning-supported decision-making tool.},
keywords = {EOR screening, machine learning, screening criteria, imbalanced data, multi-class classification, enhanced oil recovery},
issn = {3070-2356},
publisher = {Institute of Central Computation and Knowledge}
}
Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and Permissions
Copyright © 2026 by the Author(s). Published by Institute of Central Computation and Knowledge. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.