Abstract: Multimodal large language models (MLLMs) have made remarkable progress across various domains, excelling in tasks such as question answering, segmentation, and detection. However, their ...