VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text?

by Qing’an Liu et al.

Feb 11, 202607:10

VISTA-BenchVision-Language Models (VLMs)Modality GapVisualized Text
00:0007:10
Download on the App Store

Get the full experience with ResearchPod

ResearchPod turns research papers into podcasts you can actually follow.