Train Your Own Image Classification AI with GeForce RTX 4060 Ti: Practical Guide to YOLO Model

Table of Contents

AI chatbots have been with us for almost two years, and for many they have already become the same revolution as the massive advent of the Internet thirty years ago. The so-called big language models may one day solve even the most complex problems of humanity, but for now they are still seconding a whole constellation of much simpler neurons.

A thousand times smaller AI model will not compose a poem about Brno and paint Pope Francis in a swimming pool with cute nuns, but on the other hand, it can also run on a relatively simple security camera processor.

First it needs some iron

In the next installment of our electronics programming series, we’ll try it out in practice and we will train our own image classification neuron. And so that we don’t spend the entire late puberty exercising, we’ll call on a graphics card for help GeForce RTX 4060 Ti Windforce OC 16G.

The GPU with the Nvidia GeForce RTX 4060 Ti chip and in the version with 16 GB of memory will become the reference machine for our games with basic neurons and generative AI for the coming weeks and months

The graphics processor from Nvidia is armed 4,352 CUDA computing cores a 16 GB fast memories, machine learning can therefore dramatically parallelize – divide into subtasks that are processed side by side at the same time.

When you have 4 thousand instead of 12 cores in your computer

Once upon a time, this approach made amazingly fast rendering of 3D polygons in PC games and even more photorealistic exploding brains in the Battlefield series, but clever minds soon discovered that special algorithms for CUDA miniprocessors could do other things just as well.

A desktop Xeon is just a Xeon, and even though my E5-1650 v4 is very old, 12 cores @ 3.6GHz will still do a ton of work. But the GeForce RTX 4060 Ti has 4,352 computing cores!

Thanks to massive parallelism, they can sort an array of 200 million random numbers in, say, a ridiculous 130 milliseconds. Just for comparison, on my aging twelve-core desktop Xeon (Intel Xeon E5-1650 v4), the same task takes about 5,600 milliseconds.

So now we all know why we train neurons on graphics cards. They have thousands of tiny computing cores, so they can very quickly solve a problem that we can break down into them. Although the cores of a regular desktop CPU can be faster in themselves and are of course much more versatile, we still have too few of them under the hood of the machine.

YOLO 11th generation

Enough theory, let’s practice! It’s 2024, there’s a lot of proven technology available, so we don’t have to reinvent the wheel. We will train our image detector to YOLO neural network from Ultralytics, which is already available today in its own 11th generationis by far the most popular in its field and can handle five basic tasks:

Each of the variants is also available in several learned sizes from 1.6 million to 62 million parameters. I’ll just remind you that the number of parameters corresponds to the complexity of the learned neural network, and large language models can have hundreds of millions to billions of them. For the same reason, they need a supercomputer to run, while you can run a neuron with units of millions of parameters on (almost) anything.

Basic capabilities and variants of the YOLO image neuron

We can directly use the finished YOLO in our code using absolutely the foolproof Ultralytics library for Python. Such a program is really enough with only a few lines of code, because we just send a JPEG image to the model and it soon spits out what it sees in it.

But because the model is small, it is only trained to see a limited number of objects. But what if we need to use it to detect something of our own that is missing from the database?

Relearning a network is like a toddler seeing a new toy. They don’t learn completely from scratch

This is already a task for our GPU (or, of course, also a classic CPU processor, with the fact that learning will be many times slower), we can easily retrain any YOLO from the menu on our own data.

I won’t go into too much depth, but the trick of retraining a neural network is that we don’t have to train it from scratch, we only change its last layer. What does this mean in practice?

If we were to teach a neuron to see from scratch, it would be extremely difficult. The neural network has to gradually model everything, like when a baby opens its eyes for the first time and begins to perceive different colors, color divisions, and only later begins to distinguish and identify different objects.

When retraining the YOLO neuron, we do not have to teach the network to see again, but a toddler who already knows that the five cheesy idiots above the crib are Temu toyswe present a few new ones that have not yet been seen.

The poor kid will use his existing abilities and will be able to identify them relatively quickly. In essence, YOLO and other image AIs work quite similarly.

We will make a weather detector

By far the simplest role of YOLO models is classification. In this case, the AI ​​does not learn exactly where and what object is in the image, which we can then mark with a rectangle and continue to work with, but only finds out whether the image corresponds to the object or not and with what probability.

The YOLO 11 classification variant is the simplest of all, and therefore we will use it in the experiment as well

So while the detection AI will say:

  • I see with 95% probability a mug at position 135.45 and has dimensions of 543×256 px,

classification AI will say:

Classification detection is not so computationally demanding and in many cases it is more than enough, including today’s assignment. We will try to train a weather detector that will classify whether the photo has a clear, semi-clear or cloudy sky. He won’t have to pinpoint where exactly it is, because that’s completely irrelevant.

Our trained neuron in action while calculating the type of weather on the input video. The sky gradually worsens, so the number of images in which the retrained YOLO sees a partly clear and cloudy sky increases

We could then use the neuron learned in this way, for example, on a Raspberry Pi with a camera that would take a picture of the view from your window every ten minutes and store information about what the weather was like. Or, if you’re not into amateur meteorology, it could go through an archive of your photos over the years on a NAS somewhere and save the keyword about the weather in the picture to the EXIF ​​for easy retrieval of all the good shots.

By the way, cloud AI photo galleries such as Google Photos and others work on a similar principle.

Leave a Replay