Volume 3, Issue 1, ICCK Transactions on Sensing, Communication, and Control
Volume 3, Issue 1, 2026
Submit Manuscript Edit a Special Issue
Article QR Code
Article QR Code
Scan the QR code for reading
Popular articles
ICCK Transactions on Sensing, Communication, and Control, Volume 3, Issue 1, 2026: 27-38

Free to Read | Research Article | 14 February 2026
Context Refinement with Multi-Attention Fusion for Saliency Segmentation Using Depth-Aware RGBD Sensing
1 Global Degree College, Peshawar 25000, Pakistan
2 School of Computing, University of Eastern Finland, Joensuu 80100, Finland
* Corresponding Author: Abdurrahman Khan, [email protected]
ARK: ark:/57805/tscc.2025.587957
Received: 09 December 2025, Accepted: 12 January 2026, Published: 14 February 2026  
Abstract
Salient object detection in RGB-D imagery remains challenging due to inconsistent depth quality and suboptimal cross-modal fusion strategies. This paper presents a novel dual-stream architecture that integrates contextual feature refinement with adaptive attention mechanisms for robust RGB-D saliency detection. We extract two features from the ResNet-50 backbone for both the RGB and depth streams, capturing low-level spatial details and high-level semantic representations. We introduce a Contextual Feature Refinement Module (CFRM) that captures multi-scale dependencies through parallel dilated convolutions, enabling hierarchical context aggregation without substantial computational overhead. To enhance discriminative feature learning, we employ channel attention for inter-channel recalibration and a modified spatial attention mechanism utilizing quadruple feature statistics for precise localization. Recognizing that existing depth maps in benchmark datasets are outdated and degraded in quality, we introduce refined depth maps generated with Depth AnythingV2, which significantly improve cross-modal alignment and detection performance. The progressive fusion strategy integrates complementary RGB and depth information across semantic hierarchies, while the saliency prediction block generates high-resolution predictions via gradual spatial expansion. Extensive experiments across six benchmark datasets validate our approach, achieving competitive performance with recent state-of-the-art methods.

Graphical Abstract
Context Refinement with Multi-Attention Fusion for Saliency Segmentation Using Depth-Aware RGBD Sensing

Keywords
RGB-D saliency
attention mechanisms
multi-modal fusion
depth refinement
contextual features

Data Availability Statement
Data will be made available on request.

Funding
This work was supported without any funding.

Conflicts of Interest
The authors declare no conflicts of interest.

AI Use Statement
The authors declare that no generative AI was used in the preparation of this manuscript.

Ethical Approval and Consent to Participate
Not applicable.

References
  1. Dhara, G., & Kumar, R. K. (2025). A survey on visual saliency detection approaches and attention models. Multimedia Tools and Applications, 1-43.
    [CrossRef]   [Google Scholar]
  2. Wei, S., Liao, L., Li, J., Zheng, Q., Yang, F., & Zhao, Y. (2019). Saliency inside: Learning attentive CNNs for content-based image retrieval. IEEE Transactions on Image Processing, 28(9), 4580–4593.
    [CrossRef]   [Google Scholar]
  3. Cao, Q., Zhang, D., & Zhang, X. (2024, December). Saliency-Based Neural Representation for Videos. In International Conference on Pattern Recognition (pp. 389-403). Cham: Springer Nature Switzerland.
    [CrossRef]   [Google Scholar]
  4. Zhou, Z., Pei, W., Li, X., Wang, H., Zheng, F., & He, Z. (2021). Saliency-associated object tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 9866–9875).
    [CrossRef]   [Google Scholar]
  5. Srivastava, D., Singh, S. S., Rajitha, B., Verma, M., Kaur, M., & Lee, H. N. (2023). Content-based image retrieval: A survey on local and global features selection, extraction, representation, and evaluation parameters. IEEE Access, 11, 95410-95431.
    [CrossRef]   [Google Scholar]
  6. Liu, J., Dian, R., Li, S., & Liu, H. (2023). SGFusion: A saliency guided deep-learning framework for pixel-level image fusion. Information Fusion, 91, 205-214.
    [CrossRef]   [Google Scholar]
  7. Xia, W., Zhu, J., Huang, Z., Qi, J., He, Y., & Jia, X. (2025, May). Towards Survivability in Complex Motion Scenarios: RGB-Event Object Tracking via Historical Trajectory Prompting. In 2025 IEEE International Conference on Robotics and Automation (ICRA) (pp. 6447-6453). IEEE.
    [CrossRef]   [Google Scholar]
  8. Zhou, T., Fan, D. P., Cheng, M. M., Shen, J., & Shao, L. (2021). RGB-D salient object detection: A survey. Computational Visual Media, 7(1), 37-69.
    [CrossRef]   [Google Scholar]
  9. Chen, A., Li, X., He, T., Zhou, J., & Chen, D. (2024). Advancing in RGB-D salient object detection: A survey. Applied Sciences, 14(17), 8078.
    [CrossRef]   [Google Scholar]
  10. Piao, Y., Rong, Z., Zhang, M., Ren, W., & Lu, H. (2020, June). A2dele: Adaptive and Attentive Depth Distiller for Efficient RGB-D Salient Object Detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 9057-9066). IEEE.
    [CrossRef]   [Google Scholar]
  11. Jin, X., Yi, K., & Xu, J. (2022). MoADNet: Mobile asymmetric dual-stream networks for real-time and lightweight RGB-D salient object detection. IEEE Transactions on Circuits and Systems for Video Technology, 32(11), 7632-7645.
    [CrossRef]   [Google Scholar]
  12. Zhang, W., Ji, G. P., Wang, Z., Fu, K., & Zhao, Q. (2021). Depth quality-inspired feature manipulation for efficient RGB-D salient object detection. In Proceedings of the 29th ACM International Conference on Multimedia (pp. 731–740).
    [CrossRef]   [Google Scholar]
  13. Wu, Y. H., Liu, Y., Xu, J., Bian, J. W., Gu, Y. C., & Cheng, M. M. (2021). MobileSal: Extremely efficient RGB-D salient object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12), 10261-10269.
    [CrossRef]   [Google Scholar]
  14. Huang, N., Jiao, Q., Zhang, Q., & Han, J. (2022). Middle-level feature fusion for lightweight RGB-D salient object detection. IEEE Transactions on Image processing, 31, 6621-6634.
    [CrossRef]   [Google Scholar]
  15. Cong, R., Lei, J., Fu, H., Hou, J., Huang, Q., & Kwong, S. (2019). Going from RGB to RGBD saliency: A depth-guided transformation model. IEEE transactions on cybernetics, 50(8), 3627-3639.
    [CrossRef]   [Google Scholar]
  16. Chen, K., Zhou, Z., Li, K., Su, T., Zhang, Z., Liu, J., & Ying, C. (2025). Red green blue-depth salient object detection based on multi-scale refinement and cross-modalities fusion network. The Visual Computer, 1–24.
    [CrossRef]   [Google Scholar]
  17. Chen, H., Shen, F., Ding, D., Deng, Y., & Li, C. (2024). Disentangled cross-modal transformer for RGB-D salient object detection and beyond. IEEE Transactions on Image Processing, 33, 1699-1709.
    [CrossRef]   [Google Scholar]
  18. Chen, Q., Zhang, Z., Lu, Y., Fu, K., & Zhao, Q. (2022). 3-D convolutional neural networks for RGB-D salient object detection and beyond. IEEE Transactions on Neural Networks and Learning Systems, 35(3), 4309-4323.
    [CrossRef]   [Google Scholar]
  19. Li, L., Han, J., Liu, N., Khan, S., Cholakkal, H., Anwer, R. M., & Khan, F. S. (2023). Robust perception and precise segmentation for scribble-supervised RGB-D saliency detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1), 479-496.
    [CrossRef]   [Google Scholar]
  20. Cong, R., Liu, H., Zhang, C., Zhang, W., Zheng, F., Song, R., & Kwong, S. (2023). Point-aware interaction and CNN-induced refinement network for RGB-D salient object detection. In Proceedings of the 31st ACM International Conference on Multimedia (pp. 406–416).
    [CrossRef]   [Google Scholar]
  21. Wu, Z., Allibert, G., Meriaudeau, F., Ma, C., & Demonceaux, C. (2023). Hidanet: Rgb-d salient object detection via hierarchical depth awareness. IEEE Transactions on Image Processing, 32, 2160-2173.
    [CrossRef]   [Google Scholar]
  22. Chen, G., Shao, F., Chai, X., Chen, H., Jiang, Q., Meng, X., & Ho, Y. S. (2022). Modality-induced transfer-fusion network for RGB-D and RGB-T salient object detection. IEEE Transactions on Circuits and Systems for Video Technology, 33(4), 1787-1801.
    [CrossRef]   [Google Scholar]
  23. Luo, Y., Shao, F., Xie, Z., Wang, H., Chen, H., Mu, B., & Jiang, Q. (2024). HFMDNet: Hierarchical fusion and multilevel decoder network for RGB-D salient object detection. IEEE Transactions on Instrumentation and Measurement, 73, 1-15.
    [CrossRef]   [Google Scholar]
  24. Zhu, Y., Han, G., Zhu, H., & Zhang, F. (2025). Feature Description Attention: Channel-independent local–global fusion for multi-scale feature representation. Engineering Applications of Artificial Intelligence, 161, 112139.
    [CrossRef]   [Google Scholar]
  25. Zhang, Q., Qin, Q., Yang, Y., Jiao, Q., & Han, J. (2023). Feature calibrating and fusing network for RGB-D salient object detection. IEEE Transactions on Circuits and Systems for Video Technology, 34(3), 1493-1507.
    [CrossRef]   [Google Scholar]
  26. Duan, S., Yang, X., Wang, N., & Gao, X. (2025). Lightweight RGB-D Salient Object Detection from a Speed-Accuracy Tradeoff Perspective. IEEE Transactions on Image Processing.
    [CrossRef]   [Google Scholar]
  27. Ju, R., Ge, L., Geng, W., Ren, T., & Wu, G. (2014). Depth saliency based on anisotropic center-surround difference. In 2014 IEEE International Conference on Image Processing (ICIP) (pp. 1115–1119). IEEE.
    [CrossRef]   [Google Scholar]
  28. Peng, H., Li, B., Xiong, W., Hu, W., & Ji, R. (2014, September). RGBD salient object detection: A benchmark and algorithms. In European conference on computer vision (pp. 92-109). Cham: Springer International Publishing.
    [CrossRef]   [Google Scholar]
  29. Cheng, Y., Fu, H., Wei, X., Xiao, J., & Cao, X. (2014, July). Depth enhanced saliency detection method. In Proceedings of international conference on internet multimedia computing and service (pp. 23-27).
    [CrossRef]   [Google Scholar]
  30. Li, G., & Zhu, C. (2017, October). A Three-Pathway Psychobiological Framework of Salient Object Detection Using Stereoscopic Technology. In 2017 IEEE International Conference on Computer Vision Workshops (ICCVW) (pp. 3008-3014). IEEE.
    [CrossRef]   [Google Scholar]
  31. Niu, Y., Geng, Y., Li, X., & Liu, F. (2012). Leveraging stereopsis for saliency analysis. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (pp. 454–461). IEEE.
    [CrossRef]   [Google Scholar]
  32. Fan, D. P., Lin, Z., Zhang, Z., Zhu, M., & Cheng, M. M. (2020). Rethinking RGB-D salient object detection: Models, data sets, and large-scale benchmarks. IEEE Transactions on neural networks and learning systems, 32(5), 2075-2089.
    [CrossRef]   [Google Scholar]
  33. Li, G., Liu, Z., & Ling, H. (2020). ICNet: Information conversion network for RGB-D based salient object detection. IEEE Transactions on Image Processing, 29, 4873-4884.
    [CrossRef]   [Google Scholar]
  34. Fan, D. P., Cheng, M. M., Liu, Y., Li, T., & Borji, A. (2017, October). Structure-Measure: A New Way to Evaluate Foreground Maps. In 2017 IEEE International Conference on Computer Vision (ICCV) (pp. 4558-4567). IEEE.
    [CrossRef]   [Google Scholar]
  35. Achanta, R., Hemami, S., Estrada, F., & Susstrunk, S. (2009, June). Frequency-tuned salient region detection. In 2009 IEEE conference on computer vision and pattern recognition (pp. 1597-1604). IEEE.
    [CrossRef]   [Google Scholar]
  36. Fan, D. P., Gong, C., Cao, Y., Ren, B., Cheng, M. M., & Borji, A. (2018). Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421.
    [Google Scholar]
  37. Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
    [Google Scholar]
  38. Bi, H., Wu, R., Liu, Z., Zhu, H., Zhang, C., & Xiang, T. Z. (2023). Cross-modal hierarchical interaction network for RGB-D salient object detection. Pattern Recognition, 136, 109194.
    [CrossRef]   [Google Scholar]
  39. Cong, R., Lin, Q., Zhang, C., Li, C., Cao, X., Huang, Q., & Zhao, Y. (2022). CIR-Net: Cross-modality interaction and refinement for RGB-D salient object detection. IEEE Transactions on Image Processing, 31, 6800-6815.
    [CrossRef]   [Google Scholar]
  40. Zhang, Z., Lin, Z., Xu, J., Jin, W. D., Lu, S. P., & Fan, D. P. (2021). Bilateral attention network for RGB-D salient object detection. IEEE transactions on image processing, 30, 1949-1961.
    [CrossRef]   [Google Scholar]
  41. Fu, K., Fan, D. P., Ji, G. P., Zhao, Q., Shen, J., & Zhu, C. (2021). Siamese network for RGB-D salient object detection and beyond. IEEE transactions on pattern analysis and machine intelligence, 44(9), 5541-5559.
    [CrossRef]   [Google Scholar]
  42. Zhang, W., Jiang, Y., Fu, K., & Zhao, Q. (2021, July). BTS-Net: Bi-Directional Transfer-And-Selection Network for RGB-D Salient Object Detection. In 2021 IEEE International Conference on Multimedia and Expo (ICME) (pp. 1-6). IEEE.
    [CrossRef]   [Google Scholar]

Cite This Article
APA Style
Khan, A., & Shah, H. A. (2026). Context Refinement with Multi-Attention Fusion for Saliency Segmentation Using Depth-Aware RGBD Sensing. ICCK Transactions on Sensing, Communication, and Control, 3(1), 27–38. https://doi.org/10.62762/TSCC.2025.587957
Export Citation
RIS Format
Compatible with EndNote, Zotero, Mendeley, and other reference managers
RIS format data for reference managers
TY  - JOUR
AU  - Khan, Abdurrahman
AU  - Shah, Hasnain Ali
PY  - 2026
DA  - 2026/02/14
TI  - Context Refinement with Multi-Attention Fusion for Saliency Segmentation Using Depth-Aware RGBD Sensing
JO  - ICCK Transactions on Sensing, Communication, and Control
T2  - ICCK Transactions on Sensing, Communication, and Control
JF  - ICCK Transactions on Sensing, Communication, and Control
VL  - 3
IS  - 1
SP  - 27
EP  - 38
DO  - 10.62762/TSCC.2025.587957
UR  - https://www.icck.org/article/abs/TSCC.2025.587957
KW  - RGB-D saliency
KW  - attention mechanisms
KW  - multi-modal fusion
KW  - depth refinement
KW  - contextual features
AB  - Salient object detection in RGB-D imagery remains challenging due to inconsistent depth quality and suboptimal cross-modal fusion strategies. This paper presents a novel dual-stream architecture that integrates contextual feature refinement with adaptive attention mechanisms for robust RGB-D saliency detection. We extract two features from the ResNet-50 backbone for both the RGB and depth streams, capturing low-level spatial details and high-level semantic representations. We introduce a Contextual Feature Refinement Module (CFRM) that captures multi-scale dependencies through parallel dilated convolutions, enabling hierarchical context aggregation without substantial computational overhead. To enhance discriminative feature learning, we employ channel attention for inter-channel recalibration and a modified spatial attention mechanism utilizing quadruple feature statistics for precise localization. Recognizing that existing depth maps in benchmark datasets are outdated and degraded in quality, we introduce refined depth maps generated with Depth AnythingV2, which significantly improve cross-modal alignment and detection performance. The progressive fusion strategy integrates complementary RGB and depth information across semantic hierarchies, while the saliency prediction block generates high-resolution predictions via gradual spatial expansion. Extensive experiments across six benchmark datasets validate our approach, achieving competitive performance with recent state-of-the-art methods.
SN  - 3068-9287
PB  - Institute of Central Computation and Knowledge
LA  - English
ER  - 
BibTeX Format
Compatible with LaTeX, BibTeX, and other reference managers
BibTeX format data for LaTeX and reference managers
@article{Khan2026Context,
  author = {Abdurrahman Khan and Hasnain Ali Shah},
  title = {Context Refinement with Multi-Attention Fusion for Saliency Segmentation Using Depth-Aware RGBD Sensing},
  journal = {ICCK Transactions on Sensing, Communication, and Control},
  year = {2026},
  volume = {3},
  number = {1},
  pages = {27-38},
  doi = {10.62762/TSCC.2025.587957},
  url = {https://www.icck.org/article/abs/TSCC.2025.587957},
  abstract = {Salient object detection in RGB-D imagery remains challenging due to inconsistent depth quality and suboptimal cross-modal fusion strategies. This paper presents a novel dual-stream architecture that integrates contextual feature refinement with adaptive attention mechanisms for robust RGB-D saliency detection. We extract two features from the ResNet-50 backbone for both the RGB and depth streams, capturing low-level spatial details and high-level semantic representations. We introduce a Contextual Feature Refinement Module (CFRM) that captures multi-scale dependencies through parallel dilated convolutions, enabling hierarchical context aggregation without substantial computational overhead. To enhance discriminative feature learning, we employ channel attention for inter-channel recalibration and a modified spatial attention mechanism utilizing quadruple feature statistics for precise localization. Recognizing that existing depth maps in benchmark datasets are outdated and degraded in quality, we introduce refined depth maps generated with Depth AnythingV2, which significantly improve cross-modal alignment and detection performance. The progressive fusion strategy integrates complementary RGB and depth information across semantic hierarchies, while the saliency prediction block generates high-resolution predictions via gradual spatial expansion. Extensive experiments across six benchmark datasets validate our approach, achieving competitive performance with recent state-of-the-art methods.},
  keywords = {RGB-D saliency, attention mechanisms, multi-modal fusion, depth refinement, contextual features},
  issn = {3068-9287},
  publisher = {Institute of Central Computation and Knowledge}
}

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 12
PDF Downloads: 3

Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions
Institute of Central Computation and Knowledge (ICCK) or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
ICCK Transactions on Sensing, Communication, and Control

ICCK Transactions on Sensing, Communication, and Control

ISSN: 3068-9287 (Online) | ISSN: 3068-9279 (Print)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/