논문링크 CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip RetrievalVideo-text retrieval plays an essential role in multi-modal research and has been widely used in many real-world web applications. The CLIP (Contrastive Language-Image Pre-training), an image-language pre-training model, has demonstrated the power of visuaarxiv.orgClip의 경우 Image-Text 간의 유사도를 계산하는 방식일반적으로 Video Retrieva..