Relaxed Bounding Boxes for Object Detection

Daniel Aioanei

doi:10.62762/JIAP.2025.507329

CiteScore

Impact Factor

Volume 1, Issue 3, ICCK Journal of Image Analysis and Processing

Volume 1, Issue 3, 2025

Submit Manuscript Edit a Special Issue

Article QR Code

Scan the QR code for reading

Popular articles

Case Studies on Integrating Artificial Intelligence in Finance to Transform Decision Making and Risk Management for Enhanced Financial Outcomes Research on A Ship Trajectory Classification Method Based on Deep Learning Bridging Modalities: A Survey of Cross-Modal Image-Text Retrieval A Mimic Fusion Algorithm for Dual Channel Video Based on Possibility Distribution Synthesis Theory YOLOv7-Bw: A Dense Small Object Efficient Detector Based on Remote Sensing Image Enhancing Fake News Detection with a Hybrid NLP-Machine Learning Framework Deep Prediction Network Based on Covariance Intersection Fusion for Sensor Data Visual Feature Extraction and Tracking Method Based on Corner Flow Detection Inaugural Editorial of the Chinese Journal of Information Fusion YOLOv8-Lite: A Lightweight Object Detection Model for Real-time Autonomous Driving Systems

ICCK Journal of Image Analysis and Processing, Volume 1, Issue 3, 2025: 107-124

Open Access | Research Article | 17 September 2025

Relaxed Bounding Boxes for Object Detection

Daniel Aioanei 1 *

1 Independent Scientist, 8400 Winterthur, Switzerland

* Corresponding Author: Daniel Aioanei, [email protected]

DOI: 10.62762/JIAP.2025.507329

Received: 08 August 2025, Accepted: 07 September 2025, Published: 17 September 2025

PDF (11.51 MB)

Article Metrics Cite This Article

Abstract

The Generalized Intersection over Union (GIoU) and the Manhattan distance between axis-aligned boxes represented either as corner coordinates or their center and size, are extended to accept a range of bounding boxes as ground truth, producing the metrics RIoU, $R_1$ and $R^t_1$, respectively. In the context of Table Detection it is shown that this box relaxation procedure allows training object detection models with partial or inexact annotations. For the Table Structure Recognition task, several code improvements to Microsoft's open-source Table Transformer increase all $\mathrm{GriTS}$ metrics on PubTables-1M, with the overall accuracy increasing from 0.8326 to 0.8433. Then box relaxation is applied to take advantage in the object detection loss function of the discretizing nature of the post-inference table cell matrix extraction procedure. This further reduces the error of the $\mathrm{GriTS}$ metrics $Acc_{Con}$, $GriTS_{Con}$, $GriTS_{Loc}$ and $GriTS_{Top}$ on the PubTables-1M tables without spanning cells by 1.8%, 13.2%, 10.6% and 14.9%, respectively.

Graphical Abstract

Keywords

object detection

table detection

table structure recognition

bounding box regression

loss function

Data Availability Statement

The source code used to generate the results reported here is available under an open-source license at https://github.com/aioaneid/table-transformer. No new training data were collected for this study.

Funding

This work was supported without any funding.

Conflicts of Interest

The author declares no conflicts of interest.

Ethical Approval and Consent to Participate

Not applicable.

References

Cai, D., Zhang, Z., & Zhang, Z. (2023). Corner-point and foreground-area IoU loss: Better localization of small objects in bounding box regression. Sensors, 23(10), 4961.
[CrossRef] [Google Scholar]
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020, August). End-to-end object detection with transformers. In European conference on computer vision (pp. 213-229). Cham: Springer International Publishing.
[CrossRef] [Google Scholar]
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2009). Object detection with discriminatively trained part-based models. IEEE transactions on pattern analysis and machine intelligence, 32(9), 1627-1645.
[CrossRef] [Google Scholar]
Gou, L., Wu, S., Yang, J., Yu, H., & Li, X. (2022). Gaussian guided IoU: A better metric for balanced learning on object detection. IET Computer Vision, 16(6), 556-566.
[CrossRef] [Google Scholar]
He, J., Erfani, S., Ma, X., Bailey, J., Chi, Y., & Hua, X. S. (2021). $\alpha $-IoU: A family of power intersection over union losses for bounding box regression. Advances in neural information processing systems, 34, 20230-20242.
[CrossRef] [Google Scholar]
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., ... & Zitnick, C. L. (2014, September). Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Cham: Springer International Publishing.
[CrossRef] [Google Scholar]
Liu, C., Wang, K., Lu, H., Cao, Z., & Zhang, Z. (2022, October). Robust object detection with inaccurate bounding boxes. In European Conference on Computer Vision (pp. 53-69). Cham: Springer Nature Switzerland.
[CrossRef] [Google Scholar]
Ma, S., & Xu, Y. (2023). Mpdiou: a loss for efficient and accurate bounding box regression. arXiv preprint arXiv:2307.07662.
[Google Scholar]
Pivato, M., De Franceschi, G., Tosatto, L., Frare, E., Kumar, D., Aioanei, D., ... & Bubacco, L. (2012). Covalent α-synuclein dimers: chemico-physical and aggregation properties. PloS one, 7(12), e50027.
[CrossRef] [Google Scholar]
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., & Savarese, S. (2019, June). Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 658-666). IEEE.
[CrossRef] [Google Scholar]
Smock, B., Pesala, R., & Abraham, R. (2022, June). PubTables-1M: Towards comprehensive table extraction from unstructured documents. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 4624-4632). IEEE.
[CrossRef] [Google Scholar]
Smock, B., Pesala, R., & Abraham, R. (2023, August). Aligning benchmark datasets for table structure recognition. In International Conference on Document Analysis and Recognition (pp. 371-386). Cham: Springer Nature Switzerland.
[CrossRef] [Google Scholar]
Smock, B., Pesala, R., & Abraham, R. (2023, August). GriTS: Grid table similarity metric for table structure recognition. In International Conference on Document Analysis and Recognition (pp. 535-549). Cham: Springer Nature Switzerland.
[CrossRef] [Google Scholar]
Suzuki, S. (1985). Topological structural analysis of digitized binary images by border following. Computer vision, graphics, and image processing, 30(1), 32-46.
[CrossRef] [Google Scholar]
Tang, Y., Wang, J., Wang, X., Gao, B., Dellandréa, E., Gaizauskas, R., & Chen, L. (2017). Visual and semantic knowledge transfer for large scale semi-supervised object detection. IEEE transactions on pattern analysis and machine intelligence, 40(12), 3045-3058.
[CrossRef] [Google Scholar]
Tychsen-Smith, L., & Petersson, L. (2018, June). Improving Object Localization with Fitness NMS and Bounded IoU Loss. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 6877-6885). IEEE.
[CrossRef] [Google Scholar]
Uma, A., Fornaciari, T., Hovy, D., Paun, S., Plank, B., & Poesio, M. (2020, October). A Case for Soft Loss Functions. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing (Vol. 8, pp. 173-177).
[CrossRef] [Google Scholar]
Wu, Z., Bodla, N., Singh, B., Najibi, M., Chellappa, R., & Davis, L. S. (2020). Soft sampling for robust object detection. In 30th British Machine Vision Conference, BMVC 2019.
[Google Scholar]
Zhang, X., Wan, F., Liu, C., Ji, R., & Ye, Q. (2019). Freeanchor: Learning to match anchors for visual object detection. Advances in neural information processing systems, 32.
[Google Scholar]
Zhang, X., Yang, Y., & Feng, J. (2019). Learning to localize objects with noisy labeled instances. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI'19), pp. 1--10. AAAI Press.
[CrossRef] [Google Scholar]
Xu, J., Ma, Y., He, S., & Zhu, J. (2019). 3D-GIoU: 3D generalized intersection over union for object detection in point cloud. Sensors, 19(19), 4093.
[CrossRef] [Google Scholar]
Zhang, Z., & Sabuncu, M. (2018). Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in neural information processing systems, 31.
[Google Scholar]
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., & Ren, D. (2020, April). Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 07, pp. 12993-13000).
[CrossRef] [Google Scholar]
Le, D. T., Pham, T., Cai, J., & Rezatofighi, H. (2025). Marginalized Generalized IoU (MGIoU): A Unified Objective Function for Optimizing Any Convex Parametric Shapes. arXiv preprint arXiv:2504.16443.
[Google Scholar]
Xiao, C., Li, Y., Baytas, I. M., Zhou, J., & Wang, F. (2018). An MCEM framework for drug safety signal detection and combination from heterogeneous real world evidence. Scientific reports, 8(1), 1806.
[CrossRef] [Google Scholar]
García-Minguillán López, O., Jiménez Valbuena, A., & Maestu Unturbe, C. (2019). Significant cellular viability dependence on time exposition at ELF-EMF and RF-EMF in vitro studies. International journal of environmental research and public health, 16(12), 2085.
[CrossRef] [Google Scholar]
Aadal, L., Pallesen, H., Arntzen, C., & Moe, S. (2018). Municipal Cross‐Disciplinary Rehabilitation following Stroke in Denmark and Norway: A Qualitative Study. Rehabilitation Research and Practice, 2018(1), 1972190.
[CrossRef] [Google Scholar]
Wan, Y. W., Sabbagh, E., Raese, R., Qian, Y., Luo, D., Denvir, J., ... & Guo, N. L. (2010). Hybrid models identified a 12-gene signature for lung cancer prognosis and chemoresponse prediction. PLoS One, 5(8), e12222.
[CrossRef] [Google Scholar]

Cite This Article

APA Style

Aioanei, D. (2025). Relaxed Bounding Boxes for Object Detection. ICCK Journal of Image Analysis and Processing, 1(3), 107–124. https://doi.org/10.62762/JIAP.2025.507329

Article Metrics

Citations:

Google Scholar

Crossref

Scopus

Web of Science

Article Access Statistics:

PDF Downloads: 60

Publisher's Note

ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions

Copyright © 2025 by the Author(s). Published by Institute of Central Computation and Knowledge. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

ICCK Journal of Image Analysis and Processing

ISSN: 3068-6679 (Online)

Email: [email protected]

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/

Google Scholar

Crossref

Scopus

Web of Science

We use cookies