With just a few strokes, how to draw a simple stroke of an animal is a problem that many people have not understood since childhood – now AI can also achieve it. In the picture below, on the left are three different animal photos, and on the right, AI only uses lines to describe their shapes and expressions. From 32 strokes to 4 strokes, even if a lot of information is abstracted and omitted, we can still identify the corresponding animals, especially the cat at the bottom. Only 4 strokes of curves can show the charm of cats:
access:
Alibaba Cloud 2022 “Cloud Procurement Season”” Hundreds of cloud products start at 0.24% off
Looking at this horse once more, only the head, mane and hooves are left in the abstraction, which is a bit like Picasso’s bull.
What’s even more amazing is that the model behind it, CLIPasso, was not trained on the sketch dataset – it means that you have not “learned” how to draw abstract paintings, and you can complete simple sketches according to the text description.
You must know that daily sketches are relatively abstract. Even if it is drawn by people, it takes a long time to train in order to grasp the “soul” of an object. So why is this CLIPasso able to get the “soul” of abstract sketches without even training the sketch dataset?
CLIPasso how to draw abstract paintings
In fact, it is more difficult for AI to draw abstract simple strokes than people. It is necessary to understand the semantics accurately and to be geometrically similar in order to make an abstract painting look “like”.
In terms of specific implementation, the model will first generate the position of the initial line according to the feature map of the image, and then rely on CLIP to construct two loss functions to control the geometric similarity and accurate semantic understanding of the abstract painting.
Among them, CLIP is a reordering model released by OpenAI, which will filter out the pictures with the highest matching degree of text by scoring and ranking. In this way, the overall structure of CLIPasso is clearer:
For example, if we want to draw a horse, we first mark the positions of some initial lines (S1, S2…Sn) through the feature map (saliency). The lines are then projected onto the imaging plane by rasterizer:
The next step is to optimize the line parameters. Import the initial image into the CLIP model and compute the geometric loss (Lg) and the semantic loss (Ls). The semantic loss judges the difference between the two images by cosine similarity, and the geometric loss is controlled by the intermediate layer.
This ensures that the geometry is accurate with an accurate understanding of the semantics, and the line parameters are continuously adjusted through backpropagation until the loss converges. How to control the degree of abstraction of the sketch?
It is by setting the number of lines. Similarly, if you draw a horse with 32 strokes and only 4 strokes, the abstract effect is definitely different:
Finally, let’s take a look at how recognizable the paintings drawn by CLIPasso are. The bar graph in the figure below represents the recognition accuracy of five types of animals. But there is a sixth option when it comes to guessing: none of the five animals.
It can be seen from the figure that no matter what animal, when it is highly abstract (4 strokes), the recognition degree is very low, and the recognition degree will gradually increase as there are more and more strokes. After all, it is normal for such an abstract painting to be invisible.
However, the modeling team removed the sixth option in the second round of testing for identification, that is, it must be classified from one of the five animal types. At this time, we can see from the column chart below that even the highly abstract 4-stroke, the recognition has improved a lot, from 36% to 76%.
This shows that it was too abstract to recognize before, and AI Picasso’s paintings still capture the core features of animals. At present, this model has a colab version, just add the picture you want to abstract in the left folder, and then run the three parts to get the output photo.
About the Author
The CLIPasso team members are mainly from EPFL, Tel Aviv University, etc. Among them, Jessica is a master student in robotics at ETH Zurich, and is currently an intern at VILAB, the computer vision laboratory of EPFL.
Yale Vinker is a PhD student in computer science at Tel Aviv University and is very interested in the intersection of art and technology. It is no wonder that CLIPasso has such a rich artistic cell.