VQA 썸네일형 리스트형 [논문 리뷰] LLaVA 1.5: Improved Baselines with Visual Instruction Tuning Improved Baselines with Visual Instruction Tuninghttps://arxiv.org/abs/2310.03744 Improved Baselines with Visual Instruction TuningLarge multimodal models (LMM) have recently shown encouraging progress with visual instruction tuning. In this note, we show that the fully-connected vision-language cross-modal connector in LLaVA is surprisingly powerful and data-efficient. With simple moarxiv.orgLi.. 더보기 이전 1 다음