Scaling AI with Limited Labeled Data: A Self-Supervised Learning Approach

Praveen Kumar Myakala

doi:10.62762/TETAI.2025.607708

Article Information

Published in ICCK Transactions on Emerging Topics in Artificial Intelligence

Volume/Issue Volume 2, Issue 1, 2025

Pages 26-35

Cited by 12 (Crossref) 11 (Scopus)

Abstract

The scalability of modern AI is fundamentally limited by the availability of labeled data. While supervised learning achieves remarkable performance, it relies on large annotated datasets, which are expensive and time-consuming to acquire. This work explores self-supervised learning (SSL) as a promising solution to this challenge, enabling AI to scale effectively in data-scarce scenarios. This study demonstrates the effectiveness of the proposed SSL framework using the EuroSAT dataset, a benchmark for land cover classification where labeled data is limited and costly. The proposed approach integrates contrastive learning with multi-spectral augmentations, such as spectral jittering and band shuffling, along with masked autoencoding that applies spatial-spectral masking based on local variance in spectral bands. This method effectively captures the unique spatial and spectral characteristics of EuroSAT imagery. Experimental results show that the proposed SSL-based models achieve 81.2% accuracy with only 10% of the labeled data, outperforming supervised learning by 2.7% and semi-supervised methods by 2.1%. These results demonstrate the potential of SSL to reduce reliance on labeled data and enable effective AI deployment in data-constrained environments. The proposed work highlights the transformative potential of SSL in reducing annotation burdens, paving the way for more scalable, accessible, and cost-effective AI solutions.

Graphical Abstract

Scaling AI with Limited Labeled Data: A Self-Supervised Learning Approach

Keywords

self-supervised Learning (SSL) limited labeled data data-scarce scenarios contrastive learning masked autoencoding scalable AI

Data Availability Statement

Data will be made available on request.

Funding

This work was supported without any funding.

Conflicts of Interest

The author declares no conflicts of interest.

Ethical Approval and Consent to Participate

Not applicable.

References

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.
[CrossRef] [Google Scholar]
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning (pp. 1597-1607). PMLR.
[CrossRef] [Google Scholar]
Zhu, X., & Goldberg, A. B. (2009). Introduction to semi-supervised learning. Synthesis lectures on artificial intelligence and machine learning, 3(1), 1-130.
[CrossRef] [Google Scholar]
Hady, M. F. A., & Schwenker, F. (2013). Semi-supervised learning. Handbook on neural information processing, 215-239.
[CrossRef] [Google Scholar]
Salehi, S., & Schmeink, A. (2023). Data-centric green artificial intelligence: A survey. IEEE Transactions on Artificial Intelligence, 5(5), 1973-1989.
[CrossRef] [Google Scholar]
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., & Girshick, R. (2022). Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 16000-16009).
[CrossRef] [Google Scholar]
Zhang, C., Zhang, C., Song, J., Yi, J. S. K., Zhang, K., & Kweon, I. S. (2022). A survey on masked autoencoder for self-supervised learning in vision and beyond. arXiv preprint arXiv:2208.00173.
[CrossRef] [Google Scholar]
Li, G., Yu, Z., Yang, K., Lin, M., & Chen, C. P. (2024). Exploring feature selection with limited labels: A comprehensive survey of semi-supervised and unsupervised approaches. IEEE Transactions on Knowledge and Data Engineering, 36(11), 6124-6144.
[CrossRef] [Google Scholar]
Paheding, S., Saleem, A., Siddiqui, M. F. H., Rawashdeh, N., Essa, A., & Reyes, A. A. (2024). Advancing horizons in remote sensing: A comprehensive survey of deep learning models and applications in image classification and beyond. Neural Computing and Applications, 36(27), 16727-16767.
[CrossRef] [Google Scholar]
Janga, B., Asamani, G. P., Sun, Z., & Cristea, N. (2023). A review of practical AI for remote sensing in earth sciences. Remote Sensing, 15(16), 4112.
[CrossRef] [Google Scholar]
Helber, P., Bischke, B., Dengel, A., & Borth, D. (2019). Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7), 2217-2226.
[CrossRef] [Google Scholar]
Gidaris, S., Singh, P., & Komodakis, N. (2018). Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728.
[CrossRef] [Google Scholar]
Noroozi, M., & Favaro, P. (2016). Unsupervised learning of visual representations by solving jigsaw puzzles. In European Conference on Computer Vision (pp. 69-84). Springer International Publishing.
[CrossRef] [Google Scholar]
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9729-9738).
[CrossRef] [Google Scholar]
Montanaro, A., Valsesia, D., Fracastoro, G., & Magli, E. (2022). Semi-supervised learning for joint SAR and multispectral land cover classification. IEEE Geoscience and Remote Sensing Letters, 19, 1-5.
[CrossRef] [Google Scholar]
Stojnic, V., & Risojevic, V. (2021). Self-supervised learning of remote sensing scene representations using contrastive multiview coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1182-1191).
[CrossRef] [Google Scholar]
Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., & Raffel, C. A. (2019). Mixmatch: A holistic approach to semi-supervised learning. Advances in Neural Information Processing Systems, 32.
[CrossRef] [Google Scholar]
Sohn, K., Berthelot, D., Carlini, N., Zhang, Z., Zhang, H., Raffel, C. A., ... & Li, C. L. (2020). Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in Neural Information Processing Systems, 33, 596-608.
[CrossRef] [Google Scholar]
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., & Liu, C. (2018). A survey on deep transfer learning. In Artificial Neural Networks and Machine Learning–ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part III 27 (pp. 270-279). Springer International Publishing.
[CrossRef] [Google Scholar]
Settles, B. (2012). Active learning. Springer.
[CrossRef] [Google Scholar]

Cited By (12)

Srinivas Chippagiri, Karan Alang, Ankush Gumber, Sooraj George Thomas. . 2025 International Conference on Computing Technologies & Data Communication (ICCTDC), 2025 .
[CrossRef]
Naga Sathwik Reddy Gona, Direesh Reddy Aunugu, Vijayalaxmi Methuku, Manan Agrawal, Praveen Kumar Myakala. . 2025 IEEE International Conference on Artificial Intelligence Testing (AITest), 2025 .
[CrossRef]
Srinivas Chippagiri, Rakesh Ramakrishna Pai, Ankush Gumber, Nirmal Sajanraj. . 2025 Global Conference in Emerging Technology (GINOTECH), 2025 .
[CrossRef]
Bhanuprakash Madupati, Karan Alang, Srikanth Kamatala, Venkata Reddy Pasam, Anil Kumar Jonnalagadda. . 2025 6th International Conference on Data Intelligence and Cognitive Informatics (ICDICI), 2025 .
[CrossRef]
Pooja Devaraju, Shivareddy Devarapalli, Raghavender Reddy Tuniki, Srikanth Kamatala. . 2025 International Conference on Computing Technologies (ICOCT), 2025 .
[CrossRef]
Aditya Gupta, Sana Zia Hassan, Gokul Narain Natarajan, Raghavender Reddy Tuniki, Satya Manesh Veerapaneni. . 2025 International Conference on Computing Technologies (ICOCT), 2025 .
[CrossRef]
Sooraj George Thomas, Satya Prakash, Direesh Reddy Aunugu. . 2025 5th International Conference on Intelligent Technologies (CONIT), 2025 .
[CrossRef]
Karan Alang, Sumeer Basha Peta, Rakesh Ramakrishna Pai, Balkrishna Patil. . 2025 Global Conference in Emerging Technology (GINOTECH), 2025 .
[CrossRef]
Sumeer Basha Peta, Karan Alang, Davinder Naruka, Bhubaneswar Bisi. . 2025 International Conference on Computing Technologies (ICOCT), 2025 .
[CrossRef]
Ramesh Somayajula, Rakesh Ramakrishna Pai, Nirmal Sajanraj, Kushal Shah. . 2025 International Conference on Computing Technologies (ICOCT), 2025 .
[CrossRef]
Hemant Soni. . 2025 IEEE International Conference on Emerging Trends in Computing and Communication (ETCOM), 2025 .
[CrossRef]
Bhanuprakash Madupati, Anil Kumar Jonnalagadda, Santosh Kumar Vududala, Rohith Varma Vegesna. . 2025 International Conference on Computing Technologies & Data Communication (ICCTDC), 2025 .
[CrossRef]

* Citation data provided by Crossref Cited-by.

Cite This Article

APA Style

Myakala, P. K. (2025). Scaling AI with Limited Labeled Data:A Self-Supervised Learning Approach. ICCK Transactions on Emerging Topics in Artificial Intelligence, 2(1), 26-35. https://doi.org/10.62762/TETAI.2025.607708

Export Citation

RIS Format

Compatible with EndNote, Zotero, Mendeley, and other reference managers

TY  - JOUR
AU  - Myakala, Praveen Kumar
PY  - 2025
DA  - 2025/03/15
TI  - Scaling AI with Limited Labeled Data: A Self-Supervised Learning Approach
JO  - ICCK Transactions on Emerging Topics in Artificial Intelligence
T2  - ICCK Transactions on Emerging Topics in Artificial Intelligence
JF  - ICCK Transactions on Emerging Topics in Artificial Intelligence
VL  - 2
IS  - 1
SP  - 26
EP  - 35
DO  - 10.62762/TETAI.2025.607708
UR  - https://www.icck.org/article/abs/TETAI.2025.607708
KW  - self-supervised Learning (SSL)
KW  - limited labeled data
KW  - data-scarce scenarios
KW  - contrastive learning
KW  - masked autoencoding
KW  - scalable AI
AB  - The scalability of modern AI is fundamentally limited by the availability of labeled data. While supervised learning achieves remarkable performance, it relies on large annotated datasets, which are expensive and time-consuming to acquire. This work explores self-supervised learning (SSL) as a promising solution to this challenge, enabling AI to scale effectively in data-scarce scenarios. This study demonstrates the effectiveness of the proposed SSL framework using the EuroSAT dataset, a benchmark for land cover classification where labeled data is limited and costly. The proposed approach integrates contrastive learning with multi-spectral augmentations, such as spectral jittering and band shuffling, along with masked autoencoding that applies spatial-spectral masking based on local variance in spectral bands. This method effectively captures the unique spatial and spectral characteristics of EuroSAT imagery. Experimental results show that the proposed SSL-based models achieve 81.2% accuracy with only 10% of the labeled data, outperforming supervised learning by 2.7% and semi-supervised methods by 2.1%. These results demonstrate the potential of SSL to reduce reliance on labeled data and enable effective AI deployment in data-constrained environments. The proposed work highlights the transformative potential of SSL in reducing annotation burdens, paving the way for more scalable, accessible, and cost-effective AI solutions.
SN  - 3068-6652
PB  - Institute of Central Computation and Knowledge
LA  - English
ER  -

BibTeX Format

Compatible with LaTeX, BibTeX, and other reference managers

@article{Myakala2025Scaling,
  author = {Praveen Kumar Myakala},
  title = {Scaling AI with Limited Labeled Data: A Self-Supervised Learning Approach},
  journal = {ICCK Transactions on Emerging Topics in Artificial Intelligence},
  year = {2025},
  volume = {2},
  number = {1},
  pages = {26-35},
  doi = {10.62762/TETAI.2025.607708},
  url = {https://www.icck.org/article/abs/TETAI.2025.607708},
  abstract = {The scalability of modern AI is fundamentally limited by the availability of labeled data. While supervised learning achieves remarkable performance, it relies on large annotated datasets, which are expensive and time-consuming to acquire. This work explores self-supervised learning (SSL) as a promising solution to this challenge, enabling AI to scale effectively in data-scarce scenarios. This study demonstrates the effectiveness of the proposed SSL framework using the EuroSAT dataset, a benchmark for land cover classification where labeled data is limited and costly. The proposed approach integrates contrastive learning with multi-spectral augmentations, such as spectral jittering and band shuffling, along with masked autoencoding that applies spatial-spectral masking based on local variance in spectral bands. This method effectively captures the unique spatial and spectral characteristics of EuroSAT imagery. Experimental results show that the proposed SSL-based models achieve 81.2\% accuracy with only 10\% of the labeled data, outperforming supervised learning by 2.7\% and semi-supervised methods by 2.1\%. These results demonstrate the potential of SSL to reduce reliance on labeled data and enable effective AI deployment in data-constrained environments. The proposed work highlights the transformative potential of SSL in reducing annotation burdens, paving the way for more scalable, accessible, and cost-effective AI solutions.},
  keywords = {self-supervised Learning (SSL), limited labeled data, data-scarce scenarios, contrastive learning, masked autoencoding, scalable AI},
  issn = {3068-6652},
  publisher = {Institute of Central Computation and Knowledge}
}

Article Metrics

Citations

Crossref

12

Scopus

11

Views

6734

PDF Downloads

806

Publisher's Note

ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions

Copyright © 2025 by the Author(s). Published by Institute of Central Computation and Knowledge. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

ICCK Transactions on Emerging Topics in Artificial Intelligence

ISSN: 3068-6652 (Online)

[email protected]

Preserved at
Portico

User

Unlimited Downloads

Complete Library Access

Membership Eligibility

Community Leadership Opportunities