AI chatbots have been with us for almost two years, and for many they have already become the same revolution as the massive advent of the Internet thirty years ago. The so-called big language models may one day solve even the most complex problems of humanity, but for now they are still seconding a whole constellation of much simpler neurons.
A thousand times smaller AI model will not compose a poem about Brno and paint Pope Francis in a swimming pool with cute nuns, but on the other hand, it can also run on a relatively simple security camera processor.
First it needs some iron
In the next installment of our electronics programming series, we’ll try it out in practice and we will train our own image classification neuron. And so that we don’t spend the entire late puberty exercising, we’ll call on a graphics card for help GeForce RTX 4060 Ti Windforce OC 16G.
The GPU with the Nvidia GeForce RTX 4060 Ti chip and in the version with 16 GB of memory will become the reference machine for our games with basic neurons and generative AI for the coming weeks and months
The graphics processor from Nvidia is armed 4,352 CUDA computing cores a 16 GB fast memories, machine learning can therefore dramatically parallelize – divide into subtasks that are processed side by side at the same time.
When you have 4 thousand instead of 12 cores in your computer
Once upon a time, this approach made amazingly fast rendering of 3D polygons in PC games and even more photorealistic exploding brains in the Battlefield series, but clever minds soon discovered that special algorithms for CUDA miniprocessors could do other things just as well.
A desktop Xeon is just a Xeon, and even though my E5-1650 v4 is very old, 12 cores @ 3.6GHz will still do a ton of work. But the GeForce RTX 4060 Ti has 4,352 computing cores!
Thanks to massive parallelism, they can sort an array of 200 million random numbers in, say, a ridiculous 130 milliseconds. Just for comparison, on my aging twelve-core desktop Xeon (Intel Xeon E5-1650 v4), the same task takes about 5,600 milliseconds.
So now we all know why we train neurons on graphics cards. They have thousands of tiny computing cores, so they can very quickly solve a problem that we can break down into them. Although the cores of a regular desktop CPU can be faster in themselves and are of course much more versatile, we still have too few of them under the hood of the machine.
YOLO 11th generation
Enough theory, let’s practice! It’s 2024, there’s a lot of proven technology available, so we don’t have to reinvent the wheel. We will train our image detector to YOLO neural network from Ultralytics, which is already available today in its own 11th generationis by far the most popular in its field and can handle five basic tasks:
Each of the variants is also available in several learned sizes from 1.6 million to 62 million parameters. I’ll just remind you that the number of parameters corresponds to the complexity of the learned neural network, and large language models can have hundreds of millions to billions of them. For the same reason, they need a supercomputer to run, while you can run a neuron with units of millions of parameters on (almost) anything.
Basic capabilities and variants of the YOLO image neuron
We can directly use the finished YOLO in our code using absolutely the foolproof Ultralytics library for Python. Such a program is really enough with only a few lines of code, because we just send a JPEG image to the model and it soon spits out what it sees in it.
But because the model is small, it is only trained to see a limited number of objects. But what if we need to use it to detect something of our own that is missing from the database?
Relearning a network is like a toddler seeing a new toy. They don’t learn completely from scratch
This is already a task for our GPU (or, of course, also a classic CPU processor, with the fact that learning will be many times slower), we can easily retrain any YOLO from the menu on our own data.
I won’t go into too much depth, but the trick of retraining a neural network is that we don’t have to train it from scratch, we only change its last layer. What does this mean in practice?
If we were to teach a neuron to see from scratch, it would be extremely difficult. The neural network has to gradually model everything, like when a baby opens its eyes for the first time and begins to perceive different colors, color divisions, and only later begins to distinguish and identify different objects.
When retraining the YOLO neuron, we do not have to teach the network to see again, but a toddler who already knows that the five cheesy idiots above the crib are Temu toyswe present a few new ones that have not yet been seen.
The poor kid will use his existing abilities and will be able to identify them relatively quickly. In essence, YOLO and other image AIs work quite similarly.
We will make a weather detector
By far the simplest role of YOLO models is classification. In this case, the AI does not learn exactly where and what object is in the image, which we can then mark with a rectangle and continue to work with, but only finds out whether the image corresponds to the object or not and with what probability.
The YOLO 11 classification variant is the simplest of all, and therefore we will use it in the experiment as well
So while the detection AI will say:
- I see with 95% probability a mug at position 135.45 and has dimensions of 543×256 px,
classification AI will say:
Classification detection is not so computationally demanding and in many cases it is more than enough, including today’s assignment. We will try to train a weather detector that will classify whether the photo has a clear, semi-clear or cloudy sky. He won’t have to pinpoint where exactly it is, because that’s completely irrelevant.
Our trained neuron in action while calculating the type of weather on the input video. The sky gradually worsens, so the number of images in which the retrained YOLO sees a partly clear and cloudy sky increases
We could then use the neuron learned in this way, for example, on a Raspberry Pi with a camera that would take a picture of the view from your window every ten minutes and store information about what the weather was like. Or, if you’re not into amateur meteorology, it could go through an archive of your photos over the years on a NAS somewhere and save the keyword about the weather in the picture to the EXIF for easy retrieval of all the good shots.
By the way, cloud AI photo galleries such as Google Photos and others work on a similar principle.
Log in
and read it for free
this article
You can also after logging in
discuss under the articles
Alright, everyone! Gather ’round, let’s take a look at this delightful piece of tech gibberish related to AI chatbots and GPUs! I know it sounds riveting, but I promise, we’ll sprinkle in a little humor and charisma.
So, we’ve got AI chatbots—a bit like that friend who just loves to talk, but instead of boring you with their ex’s drama, they might someday solve world peace… or just help your grandma order a takeaway. Honestly, it’s a mixed bag! These big language models are like the Swiss Army knives of the tech world. We’re talking about a computational evolution akin to the Internet crash landing into our daily lives. Ain’t that peachy?
Now, they mention smaller models too. And let’s face it, they’re about as useful as a chocolate teapot when it comes to Shakespearean sonnets or painting Popes in swim trunks, but boy, can they run on a toaster or a security camera! I mean, imagine your old security cam becoming an artist. Just imagine the selfies it could take!
Then, we get into the “heavy lifting” with the RTX 4060 Ti—a graphics card! I call it the Hogwarts of GPUs! “Wingardium Leviosa” can easily transform your computer into a neural networking wizard. Why? Because with 4,352 CUDA cores (that sounds like a superhero squad, doesn’t it?), this GPU can process tasks faster than a kid can say, “Are we there yet?”
I mean, remember when we used to struggle with simple calculations on slower CPUs? Now we’re soups on mega mini-computing cores! I love how they compared the RTX to an aging Xeon processor. Bless the Xeon! Old but a goldie. Still, that GeForce can sort numbers faster than I can sort my laundry… and we know how long that takes!
Then it makes an inspiring pivot to YOLO (You Only Look Once… sounds like my motto during a Netflix binge!). This 11th generation model can apparently see more objects than my ex at a reunion. But it’ll need some training—not from a boot camp but by perhaps gorging on data like a toddler on sweets.
Ah, retraining a neural network! Want to teach the AI to see a new creature? You don’t start from square one. No! You just shove a few new stickers on the existing knowledge. It’s like teaching a toddler to recognize a new toy. “See the toy, Jimmy? No, not the old one — the shiny new dinosaur!” And there it goes, happily identifying dinosaurs and transforming into a T-Rex expert.
And then we dive into the heart of the practical application, a weather detector! Now this is where it gets exciting. Can you imagine an AI sitting at your window taking pictures of the sky like a paparazzi, trying to find out if you need an umbrella? I mean, it’s living the British dream! Forget the sun; we want to know if it’s partially cloudy over the Thames, right?
And they wrap it up extravagantly—turn your basic Raspberry Pi into a meteorological marvel! Stick it in your window and let it record weather patterns. You’re officially the weather guru of your neighborhood. Just don’t get too carried away and start charging for forecasts!
So there you have it! AI, GPUs, and weather detection – all in one sunny or perhaps cloudy article. Can’t wait to see how this unfolds, or should I say… how it evolves? Just make sure to keep your Raspberry Pi well-fed with good data, so it doesn’t start producing like a toddler without a nap!
Now, who’s up for a robot poetry slam? Or more accurately, a weather report?
AI chatbots have been a part of our lives for nearly two years, and for many, they represent a transformation akin to the rise of the Internet three decades ago. The so-called big language models hold the promise of addressing even the most intricate challenges faced by humanity, although they currently operate alongside a range of simpler neural networks designed for more specific tasks.
A significantly smaller AI model won’t be able to compose a poem inspired by Brno or create an imaginative depiction of Pope Francis in a swimming pool surrounded by cheerful nuns. However, this compact model can efficiently run on a straightforward security camera processor, emphasizing its utility in practical applications.
First it needs some iron
In the forthcoming installment of our electronics programming series, we will embark on a hands-on journey to train our own image classification neuron. To expedite the process and avoid prolonged delays, we will be enlisting the aid of a high-performance graphics card, specifically the GeForce RTX 4060 Ti Windforce OC 16G.
The GPU equipped with the Nvidia GeForce RTX 4060 Ti chip, boasting an impressive 16 GB of memory, will serve as our benchmark system as we explore fundamental neuron tasks and generative AI in the forthcoming weeks.
The graphics processor from Nvidia showcases 4,352 CUDA computing cores and 16 GB of rapid memory, allowing machine learning tasks to be dramatically parallelized. This means complex problems can be divided into manageable subtasks that are processed simultaneously.
When you have 4 thousand instead of 12 cores in your computer
This innovative approach significantly accelerated 3D rendering in PC games, including visually stunning scenes in the Battlefield series. However, brilliant minds soon realized that specialized algorithms devised for CUDA miniprocessors could also effectively tackle various other computational tasks.
Despite the impressive capabilities of my aging Xeon E5-1650 v4, which operates with 12 cores at 3.6GHz, it pales in comparison to the remarkable 4,352 computing cores within the GeForce RTX 4060 Ti. This massive parallelism enables the sorting of an array of 200 million random numbers in an astonishing 130 milliseconds. In contrast, the same operation on my slower desktop Xeon takes around 5,600 milliseconds.
Consequently, it becomes evident why we prefer training neural networks on graphics cards. While the cores of a conventional desktop CPU may be faster individually, the sheer volume of cores available in a GPU allows for rapid problem-solving by dividing tasks among them.
YOLO 11th generation
Now that we’ve covered the fundamentals, it’s time to get practical! With 2024 upon us, proven technologies are readily accessible, so there’s no need to reinvent the wheel. We will utilize the widely acclaimed YOLO neural network from Ultralytics, now in its 11th generation, which excels in handling multiple image classification tasks.
Each YOLO variant is available in multiple configurations, ranging from 1.6 million to 62 million parameters. It is essential to note that the number of parameters correlates with the neural network’s complexity; large language models may contain billions of parameters, necessitating supercomputers for operation, while simpler models can efficiently run on less powerful hardware.
We can easily integrate YOLO into our programs using the user-friendly Ultralytics library for Python. Implementing YOLO requires only a few lines of code; we simply input a JPEG image, and the model promptly identifies objects in the image.
Relearning a network is like a toddler seeing a new toy
Retraining a neural network is a critical task for our GPU; while it could also use a traditional CPU, performance would be significantly slower. We can easily retrain any YOLO model on our own dataset.
While I won’t delve into the intricate details, the beauty of retraining a neural network lies in the fact that it does not require complete retraining from scratch; instead, only the final layer is modified. This means that instead of starting from square one, we retain the foundational visual recognition capabilities already possessed by the network.
When updating the YOLO neuron, we are essentially introducing some new concepts to a toddler who is already familiar with certain toys and now learns to recognize a few more new ones.
We will make a weather detector
The simplest role of YOLO models is classification. In this mode, the AI determines whether an image matches a specific object without noting its exact location.
Thus, while a detection AI might state:
- I see with 95% probability a mug at position 135.45 and has dimensions of 543×256 px,
the classification version will simply confirm whether the object is present, along with the confidence level of that identification.
This classification task is less computationally intensive yet sufficient for many applications, including our current project where we aim to teach the system to recognize if a photo contains a clear, semi-clear, or cloudy sky. The exact positioning is irrelevant for this application.
Our trained neuron will effectively analyze the weather conditions through input images. For instance, as cloud cover increases, the YOLO model will track the shift in weather classification from partly clear to predominantly cloudy.
Ultimately, this trained model could be implemented on a Raspberry Pi equipped with a camera, periodically capturing images through your window to document weather conditions. Alternatively, it may sift through archival photos on a NAS, labeling images with appropriate weather keywords for convenient retrieval.
It’s worth noting that modern cloud AI services, like Google Photos, provide similar functionalities using comparable principles.
Bilities already embedded in the model, similar to how a toddler learns to recognize new toys without needing to relearn everything from the beginning.
In essence, if a neural network is trained to identify certain objects—like basic shapes or common items—when we want it to recognize something new (say, a specific kind of weather or a new toy), we only need to adjust the final layers of the model. It’s like teaching a child that one more toy might be a dinosaur while they already know what a truck or a car looks like. They don’t forget the earlier knowledge; they just add a bit more.
Let’s Build a Weather Classifier!
As for our upcoming project, we’ll create a weather detection system using the YOLO model to classify images based on the sky condition—whether clear, semi-clear, or cloudy. This type of classification doesn’t require us to pinpoint exact objects (like a mug), which makes it computationally less demanding and more suitable for our needs.
Imagine this: we set up a Raspberry Pi camera pointing out your window, taking snapshots of the sky every ten minutes. Our YOLO-based classifier will analyze each image and record whether the sky is clear, partly cloudy, or overcast. This could turn your playful curiosity into a mini meteorological project!
Not only can such a setup provide valuable data on weather patterns, but it’s also a fun way to learn about AI and machine learning. Who would have thought that a small circuit board could become a weather researcher?
Conclusion: The Future is Bright (or Cloudy)
this journey into diving deep into AI with graphics processing units like the Nvidia GeForce RTX 4060 Ti can open up a world of possibilities, from gaming enhancements to predictive modeling. With the ease of using established technologies like YOLO and the power of modern GPUs, we can tackle creative projects with ease. Whether it’s ensuring you grab that umbrella before heading out, or sorting through years of photographs based on weather conditions, the potential is endless.
So, gear up for some coding, keep your Raspberry Pi healthy, and get ready to classify clouds like a pro! And who knows? We might just end up with our very own weather station powered by AI magic. Now, that’s what I call a sunny disposition!
It seems like you’re interested in understanding the process of training a YOLO (You Only Look Once) model for image classification, specifically for a project that involves detecting weather conditions in photographs. The content touches on various aspects of AI, neural networks, and the relevance of GPUs in processing these tasks efficiently.
Key Points:
- YOLO Overview:
– YOLO is a popular neural network architecture used for real-time object detection and image classification.
– The 11th generation of YOLO models, developed by Ultralytics, provides a user-friendly approach to integrate into Python applications.
- Retraining YOLO:
– Rather than training a neural network from scratch, you can modify the last layer of an existing model to classify new types of objects or categories (like weather conditions).
– This process leverages the model’s existing knowledge, making it faster and more efficient.
- Weather Detection Project:
– The project aims to classify sky conditions into categories such as clear, semi-clear, and cloudy.
– The focus is on classification rather than detection, meaning the model will determine the category in the image without needing to identify exact locations of objects.
- Implementation:
– With tools like the Ultralytics library, implementing the model requires only a few lines of code.
– This setup could be utilized for various applications, such as automatically cataloging images based on the weather or building a simple weather monitoring system using a Raspberry Pi.
- Advantages of Using GPUs:
– Training neural networks on a GPU (such as the RTX 4060 Ti) is vastly superior due to its high number of cores, allowing for parallel processing of tasks.
– The efficiency gained from using GPUs can reduce training time significantly compared to using traditional CPUs.
Future Directions:
- The project can be expanded to incorporate more factors, such as additional weather categories or even utilizing data from multiple sources (like online weather data) to enhance the neural network’s learning process.
- Exploring the integration of this system with home automation or notifications when specific weather conditions are detected could provide practical use.
This experiment emphasizes the growing accessibility of AI technologies for personal projects, making complex processes manageable and applicable to everyday life.