AI on MCUs
Introduction to AI on edge Devices
As I am using different type of technology in this project I also require different approaches and tools, on the blog section you will find some posts regarding concrete topics. This page will try to make an overview of the tools and concepts that being used to develop Big Cat Brother. So, let's start from the very beginning.
What is Machine Learning, Tiny Machine Learning, Computer Vision, Machine vision ?
There are many terms in the world of AI, in fact, AI is a pretty generic term and it refers to any form of artificial intelligence focused on one kind of "intelligence" or skill, I prefer the term skill; you will also hear the term General Purpose AI which studies a more generic way of learning. Inside AI you will also read terms like Deep Learning, Machine Learning, etc. Those are just details on how a machine can learn. Deep learning means that a multi-layer layer (deep) Neural Network is in place, for instance, Convolutional Neural Networks (quite used for image processing) follow under the umbrella of Deep Learning. People is used to relate Machine Learning with big machines in some Google room, what has been happening in recent years is that Machine Learning is being used in Microcontrollers, yes, MCUs even with their memory and power restrictions. Machine learning needs memory, that is because on many cases the model will end up being a lot of matrices in memory and math applied to them. These models are being optimized to be executed in small computers, for instances, smashing data into plain arrays or vectors and optimizing data types (floats replaced by integers). These tools for optimizing models to be executed are known as tinyML.
For my project I tend to analyze visual data, by visual data I don’t mean just pictures but any data represented in a graphical manner, motion, sound can be displayed in an understandable way for the human eye, so, a computer eye will understand it too, just think about spectrograms as a way of representing sound in a visual way.
The term computer vision refers to the ability of a computer to recognize objects inside an image, multiple and complex objects. Machine Vision on the other side is the name for the ability to recognize less objects in smaller images or parts of the images. With this clarification in mind, lets go to the details and tools.
How does Machine Vision Work?
Neural Networks are nets that mimic the connections in your brain, no matter how many time data scientists repeat that, it is not true, they don’t work like your brain, they are just “inspired” by your brain. The idea behind a Neural Network is to have a set of connected layers that will produce a result. In the case of a Deep Neural Network you will have a first layer that acts as the input, then a set of intermediate layers (called hidden layers) and then the output layer that will produce the result. The layers in a Convolution Neural Network use something called a Convolution followed by an activation function, all of them fancy names. The Neural Networks learn from data, so to put a clear examples, lets say that you want understand what is in a picture. You will collect LOTS of pictures and you will somehow label them, for instance, this picture is a Cat or you will place all your Cat pictures in a Cat folder and the tools you use will automatically label all the pictures inside the folder as “Cats”. This collection of pictures is called a “Dataset”. This information will be used in the training process. So, you may discovered that the input of your NN will be connected to this images and it is true, the input to your network will be each one of the pixels of a picture. That is why you need to work with smaller pictures on MCUs, the more resolution the more the memory you need.
Then, what a convolution does is to take each pixel and their surrounding pixels to “explore” features. The number of pixels taken into count is called Kernel Size (you will see some detailed explanations on the posts listed bellow this page) and again, the bigger the Kernel Size the bigger the required memory. During the convolution each pixel will be multiplied by a given number (the weight) and another number will be added (the bias), those are the most important parameters to start playing with this NN. As multiplication and addition can generate large numbers the activation function will try to get those values down again. For instance, then tangent function could be used as an activation function to convert the output to values between 0 and 1. Then, as we are talking about Deep Neural Networks, the outputs of this layer will be the input of the next one until you reach the last one. The hidden layers can apply more convolution or other functions but for sure they will have layers doing “down sampling” which means removing the less significant information to keep memory usage as low as possible. The repetition of this pattern (Convolution + Down sampling) will reduce memory requirements by a lot. Finally, the net will have the output function where it will return some sort of result, it could be the 10 most effective guesses or just one. What is happening here is that given all the data you provided and the results of the calculations the NN will determine that statistically some combination of values on the pictures means that the picture contains a Cat. It is just statistics, the NN doesn’t know that the picture has a Cat because it recognized Cat ears, Cat eyes and all the features a human can recognize. I hope this was a balanced view before we move on.
What do you need to work with tinyML ?
As I wrote before, one of the most important things is DATA, good data, real data. That said, you will need some tools and practices, I will refer to the last ones in other posts.
The second thing you need is some framework to train models, there are several options you can evaluate and some other tools that will mount on frameworks to make the entire process easier (maintaining datasets is a complex thing in the long term). The framework will allow you to train a model, tensorFlow lite is an example, nnabla from Sony is another one. What it is important to mention is that the training process will be done in a regular computers or cloud services, you don’t train models on MCUs, there could be some exceptions to this, but quite a few.
What you will do on MCUs is what it is refer as “inference” and that means, getting data, exposing that data (or forwarding it) to the trained model to get the guess. The term guess is the right one because here we are talking about probabilistic results. A model won’t tell you “that picture contains a Cat” the model will tell you “There is 94% of certainty that that picture contains a Cat”
The tools you need to know!
I am used to some tools and getting started with other ones and I will refer to five tools that I consider will cover all the angles for you:
TensorFlow lite is one of the most famous frameworks to define and train models, the lite version is meant to run on smaller computer. Here you will be able to define the layers by using several engines (Keras is the most popular I guess)
It is the same as TensorFlow but made by Sony, it is an open source Python library to define and train your models. It is quite good and it collects all the know-how developed by Sony while they added AI into their products, Eibo is an example. It will also allow you to export models to Tensorflow formats.
It is an online tool (free if you are a non-corporate developer) that sits on top of TensorFlow lite to allow you to do things really fast, they have some shortcuts such as “standard learning blocks” that are well known tensorFlow models ready to be used in some particular scenarios. It also will help you a lot in understanding your datasets, controlling version and in some cases, deploying the models to MCUs. The point of understanding your dataset is quite important.
Sony Neural Network Console
It is graphical front end on top of nnabla, it will allow you to create and train models in a very intuitive and productive way, it doesn’t help too much with your datasets nor understanding why things are evaluated the way they are but it is fast, accurate and in my case, it can export models ready to be used on the Sony Spresense without touching a single line of Python!
Now... This is promising. openMV is something that will help you to develop Machine Vision applications for the Arduino Portenta and its Vision Shield. What I find interesting about this tool is that it will allow you to work with all the things I said here but also with old school computer vision techniques. I feel particularly attracted by that possibility because that is what I did before tinyML became so popular. What you will find here is a complete library, integrated with the Arduino Protenta H7 Vision Shield and you have plenty of resource to work with, blob detection, circle detection (quite good for understanding some patterns) etc, etc. It will be able to run tensorFlow lite models too, it is a python tool running on the Arduino itself!
The final step
Once you have everything in place, you will be able to run your models in MCUs, not all MCUs nor boards are suitable for running Machine Vision, I will post some examples using the Sony Spresense and the Arduinos Portentas H7 + Vision Shield, and again, you have a few options to run models.
In the case of the Spresense, it has a built in (in the bootloader) Deep Neural Network C Runtime, this runtime will allow you to run the models you developed using Nnabla or the Sony Neural Network Console. In the past days, the TensorFlow lite support was announced for the Spresense and edgeImpulse could support the board too.
In the case of the Arduino Portenta + Vision Shield the options are more, basically because you can work with two different types of firmware, the ones you write for the Arduino Portenta or the openMV firmware. In the first case you will be able to run tensorFlow models. Edge Impulse doesn’t support the essential features for this board either yet. By using the openMV firmware you will be able to run your tensorFlow models or models developed using edgeImpuse (in this escenario the deployment is supported by downloading a zip file, but the sample gathering is not yet supported by edgeImpulse) and of course your own code by using the extensive openMV library.
With this as a general introduction, I would suggest you to read the following posts:
This two post are dedicated to the Sony Spresense and the Neural Network Console in the context of anomaly detection by using accelerometer data represented in a graphical way.
- Machine Vision on the Arduino Portenta and Vision Shield LoRa - Part I
- Machine Vision on the Arduino Portenta and Vision Shield LoRa - Part II
These two posts cover the details of the options you have with this powerful combination. You will learn how to implement image classification using edgeImpuse, openMV and the Portenta!
I will keep posting about this, stay tuned!