Abstract
With the rapid development of multimodal large language models (MLLMs), the demand for structured event extraction (EE) in the field of scientific and technological intelligence is increasing. However, significant challenges remain in zero-shot multimodal and cross-language scenarios, including inconsistent cross-language outputs and the high computational cost of full-parameter fine-tuning. This study takes VideoLLaMA2 (VL2) and its improved version VL2.1 as the core models, and builds a multimodal annotated dataset covering English, Chinese, Spanish, and Russian (including 5,728 EE samples). It systematically evaluates the performance differences of zero-shot learning, and parameter-efficient fine-tuning (QLoRA) techniques. The experimental results show that for EE, by using the VL2 model and the VL2.1 in combination with QLoRA fine-tuning to it, the triggers accuracy rate can be increased to 65.48\%, the arguments accuracy rate to 60.54\%. The study confirms that fine-tuning significantly enhance model robustness.
Data Availability Statement
Data will be made available on request.
Funding
This work was supported by the National Key Research and Development Program under Grant 2022YFB3103602.
Conflicts of Interest
The authors declare no conflicts of interest.
Ethical Approval and Consent to Participate
Not applicable.
Cite This Article
APA Style
Hong, S., Wang, X., Mei, Z., & Wickramaratne, T. B. (2025). Cross-Lingual Multimodal Event Extraction: A Unified Framework for Parameter-Efficient Fine-Tuning. ICCK Transactions on Intelligent Systematics, 2(4), 203–212. https://doi.org/10.62762/TIS.2025.610574
Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and Permissions
Institute of Central Computation and Knowledge (ICCK) or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.