전체 글 썸네일형 리스트형 [MILA] Mila - Quebec AI Institute(밀라 연구소) Visiting Researcher 후기 Intro캐나다 퀘백 주 몬트리올에 있는 MILA 연구소는 AI 4대천왕이라 불리는 사람 중 한명인 조슈아 벤지오가 설립한 AI 연구소이다. 필자의 경우 서울AI허브 파트너십 회사에 재직중으로 이번에 좋은 기회로 글로벌 공동연구 협력 기업으로 선정되어 MILA 소속 연구원과 공동 연구를 진행할 수 있었다. 그러나 몬트리올에 가기 전까지 MILA에 관해 잘 알지 못했고 한국어로 된 자료가 많이 없는거 같아 직접 다녀온 후기와 함께 MILA에 관한 이야기를 해볼까 한다.https://mila.quebec/en/about/about-mila About Mila | MilaFounded in 1993 by Yoshua Bengio, Mila is the result of a unique collaboration.. 더보기 [논문 리뷰] LLaVA-Video: OneVision: Easy Visual Task Transfer LLaVA-Video: Video Instruction Tuning With Synthetic Datahttps://arxiv.org/abs/2410.02713 Video Instruction Tuning With Synthetic DataThe development of video large multimodal models (LMMs) has been hindered by the difficulty of curating large amounts of high-quality raw data from the web. To address this, we propose an alternative approach by creating a high-quality synthetic dataset sparxiv.or.. 더보기 [논문 리뷰] LLaVA-OneVision: Easy Visual Task Transfer LLaVA-OneVision: Easy Visual Task Transferhttps://arxiv.org/abs/2408.03326 LLaVA-OneVision: Easy Visual Task TransferWe present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed by consolidating our insights into data, models, and visual representations in the LLaVA-NeXT blog series. Our experimental results demonstrate that LLaVA-OneVision is tharxiv.org1. IntroductionL.. 더보기 [논문 리뷰] LLaVA-NeXT: A Strong Zero-shot Video Understanding Model LLaVA-NeXT: A Strong Zero-shot Video Understanding Modelhttps://llava-vl.github.io/blog/2024-01-30-llava-next/ LLaVA-NeXT: Improved reasoning, OCR, and world knowledgeLLaVA team presents LLaVA-NeXT, with improved reasoning, OCR, and world knowledge. LLaVA-NeXT even exceeds Gemini Pro on several benchmarks.llava-vl.github.io https://llava-vl.github.io/blog/2024-04-30-llava-next-video/ LLaVA-NeXT:.. 더보기 [논문 리뷰] LLaVA 1.5: Improved Baselines with Visual Instruction Tuning Improved Baselines with Visual Instruction Tuninghttps://arxiv.org/abs/2310.03744 Improved Baselines with Visual Instruction TuningLarge multimodal models (LMM) have recently shown encouraging progress with visual instruction tuning. In this note, we show that the fully-connected vision-language cross-modal connector in LLaVA is surprisingly powerful and data-efficient. With simple moarxiv.orgLi.. 더보기 [논문 리뷰] LLaVA: Visual Instruction Tuning Visual Instruction Tuninghttps://arxiv.org/abs/2304.08485 Visual Instruction TuningInstruction tuning large language models (LLMs) using machine-generated instruction-following data has improved zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field. In this paper, we present the first attempt to use larxiv.orgLiu, H., Li, C., Wu, Q., & Lee, Y. J. (2023). Visu.. 더보기 [논문 리뷰] STEAD: Spatio-Temporal Efficient Anomaly Detection for Time and Compute Sensitive Applications STEAD: Spatio-Temporal Efficient Anomaly Detection for Time and Compute Sensitive Applicationshttps://arxiv.org/abs/2503.07942 STEAD: Spatio-Temporal Efficient Anomaly Detection for Time and Compute Sensitive ApplicationsThis paper presents a new method for anomaly detection in automated systems with time and compute sensitive requirements, such as autonomous driving, with unparalleled efficienc.. 더보기 [논문 리뷰] JOSENet: A Joint Stream Embedding Network for Violence Detection in Surveillance Videos JOSENet: A Joint Stream Embedding Network for Violence Detection in Surveillance Videoshttps://arxiv.org/abs/2405.02961 JOSENet: A Joint Stream Embedding Network for Violence Detection in Surveillance VideosThe increasing proliferation of video surveillance cameras and the escalating demand for crime prevention have intensified interest in the task of violence detection within the research commu.. 더보기 이전 1 2 3 4 ··· 8 다음