'VQA' 태그의 글 목록

본문 바로가기

VQA

[논문 리뷰] LLaVA 1.5: Improved Baselines with Visual Instruction Tuning Improved Baselines with Visual Instruction Tuninghttps://arxiv.org/abs/2310.03744 Improved Baselines with Visual Instruction TuningLarge multimodal models (LMM) have recently shown encouraging progress with visual instruction tuning. In this note, we show that the fully-connected vision-language cross-modal connector in LLaVA is surprisingly powerful and data-efficient. With simple moarxiv.orgLi.. 더보기

이전 1 다음

티스토리툴바