논문 리뷰 썸네일형 리스트형 [논문 리뷰] LLaVA-NeXT: A Strong Zero-shot Video Understanding Model LLaVA-NeXT: A Strong Zero-shot Video Understanding Modelhttps://llava-vl.github.io/blog/2024-01-30-llava-next/ LLaVA-NeXT: Improved reasoning, OCR, and world knowledgeLLaVA team presents LLaVA-NeXT, with improved reasoning, OCR, and world knowledge. LLaVA-NeXT even exceeds Gemini Pro on several benchmarks.llava-vl.github.io https://llava-vl.github.io/blog/2024-04-30-llava-next-video/ LLaVA-NeXT:.. 더보기 [논문 리뷰] LLaVA 1.5: Improved Baselines with Visual Instruction Tuning Improved Baselines with Visual Instruction Tuninghttps://arxiv.org/abs/2310.03744 Improved Baselines with Visual Instruction TuningLarge multimodal models (LMM) have recently shown encouraging progress with visual instruction tuning. In this note, we show that the fully-connected vision-language cross-modal connector in LLaVA is surprisingly powerful and data-efficient. With simple moarxiv.orgLi.. 더보기 [논문 리뷰] LLaVA: Visual Instruction Tuning Visual Instruction Tuninghttps://arxiv.org/abs/2304.08485 Visual Instruction TuningInstruction tuning large language models (LLMs) using machine-generated instruction-following data has improved zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field. In this paper, we present the first attempt to use larxiv.orgLiu, H., Li, C., Wu, Q., & Lee, Y. J. (2023). Visu.. 더보기 [논문 리뷰] STEAD: Spatio-Temporal Efficient Anomaly Detection for Time and Compute Sensitive Applications STEAD: Spatio-Temporal Efficient Anomaly Detection for Time and Compute Sensitive Applicationshttps://arxiv.org/abs/2503.07942 STEAD: Spatio-Temporal Efficient Anomaly Detection for Time and Compute Sensitive ApplicationsThis paper presents a new method for anomaly detection in automated systems with time and compute sensitive requirements, such as autonomous driving, with unparalleled efficienc.. 더보기 [논문 리뷰] JOSENet: A Joint Stream Embedding Network for Violence Detection in Surveillance Videos JOSENet: A Joint Stream Embedding Network for Violence Detection in Surveillance Videoshttps://arxiv.org/abs/2405.02961 JOSENet: A Joint Stream Embedding Network for Violence Detection in Surveillance VideosThe increasing proliferation of video surveillance cameras and the escalating demand for crime prevention have intensified interest in the task of violence detection within the research commu.. 더보기 이전 1 다음