Iclip parisian

2/20/2023

Iclip parisian

Read Now

In particular, we need images with captions that describe them. To train CLIP in Italian we need one thing: Data. Our model is the first CLIP model made for the Italian language. With our project, we tried to provide another resource for the Italian NLP and CV community. You can read more about this in the original paper. When projected into the space, the image of a cat and the label “cat” will be close (considering some distance metric in the space).ĬLIP shows incredibly zero-shot performance on datasets like ImageNet without seeing one training sample. If you prefer the video format, I gave a talk at LightOn AI in September, you find the video right here: After that, I’ll describe a bit more in detail how we trained it to cover the Italian language (Section 2). I will try to stay at an high-level of abstraction, but at the same time, I’ll try to share all the information needed to understand how this model does its job. In this article, I’ll first go over a general introduction of how CLIP works (Section 1).

Our demo you can play with to assess CLIP-Italian capabilities.
Our GitHub repository, where you can find our training scripts.
Obviously, this wouldn’t have been possible without the resources and the help that was provided by both HuggingFace and Google. On the latter, CLIP is able to match and to beat many supervised image classification models without seeing one example of the original training set. CLIP is useful for many tasks like image-retrieval and zero-shot image classification. We started from the CLIP model, an incredibly powerful model to jointly learn the representations of both text and images.

I say “we” because this project wouldn’t have been possible without my awesome teammates: Giuseppe, Raphael, Silvia, Gabriele, Sri and Dario. However, it turned out that the project had a much more interesting impact, in terms of results, than what we expected. The project I am going to describe started in this thread as a simple experiment on multi-modal (image and text) representation learning.

Jax-Flax HuggingFace Community Week Logo.

0 Comments

Iclip parisian

Leave a Reply.

Author

Archives

Categories