Updated: Apr 10, 2020
In this post we will examine some packages and resources related to functionality ready to be used to achieve your goals.
The SONY Spresense is not just one of those development board that comes empty to fulfill your curiosity about programming and electronics. The Spresense is a board with some purposes already defined and several features to support them. I already introduced the packages on the Spresense features figure so, in this section we will explore them with a title bit of detail to understand them better and to do so, I will start with the ones I care the most, the Camera, GNSS (GPS), DNNRT, RTC (Real Time Clock) and the Storage. It doesn’t mean they are the best capabilities on the Spresense, I think the Sensing engine is the most important one for SONY, but they might require specific sensors and drivers.
In natural evolution there were a few advances that were quite determinant for the survival of the species, this is a kind of fast sum up: Cell membrane, sexual combination, some sort of filament, leg, wing to allow them to move (with the flow at first), ears and eyes, specially the former allowed a predator to search and go for proteins (even with primitive brains).
As sight was important to primitive life, computer vision is important for autonomous and smart devices and given the fact that computer vision starts with the camera, I will write some concepts about the Camera and the Spresense APIs to handle it (I will write about coding later)
This is a CMOS image sensor (not the one on the Spresense camera module, the Sony ISX012), it is a sensor that captures the photons and translates them into electricity, the rest of the camera is composed by its lens and circuitry to make an image available from it (such as color levels, control signals to specify the beginning of a new line, etc). A computer doesn’t have nerves nor brains, but they have wires, I2C (a way to interact with peripherals) and chips running Algorithms, so, computer vision is one of the words I don't consider a buzz word today, on the contrary, I think computer vision is one of the things that will change the word, for the good and for the bad.
You won’t be exposed to interfacing nor decoding bits/bytes if you need to use the camera, the API will do that for you. The camera has two ways of operation, streaming and pictures (still images). As shown in the specs, you can select several types of formats to manipulate the bits that represents a picture, you can even have two different formats if you want one for the pictures and another one for the video streaming (lower quality). Each format is useful for different things and you don’t even need to worry about the details, but, just in case you are curious, I will say the following:
RGB (Red, Green, Blue) I guess this is the most known format even for non-technical people. It uses three bytes to specify one pixel, each byte will contain the amount of Red, Green and Blue, the combination of them will give you a wide range of colors for one pixel. This format is great is you want to project the image on a screen.
JPEG, Also widely used, it is a compressed representation of an image, if you want to store images on a card, JPEG could be the best option (unless you are taking pictures for other purposes than looking at them later)
YUV (GrayScale, RGB) In this context, this is the most versatile one for me. In the YUV format you have one byte representing a grayscale value for the pixel and another one with the RGB biases, besides images are smaller on memory (which is important on this kind of devices) the API will provide you with a lot a conversion tools from this format to other formats and sizes (Just take a look at the source code of the APIs and will notice that in one minute).
Pixel formats is the foundation for manipulating images, but besides that, you will be able to set a few parameters such as the white balance (good for adapting to different light conditions) and scene modes, some of them pretty handy such as the gray scale. Most if not all of the “scene models” will work on the YUV format.
Without leaving the conceptual arena, there are a few more details to understand. They are buffers and how to get images. Regarding the first point, you can have more than 1 buffer at the same time with different pixel formats if you want, a buffer simply holds a relative good image read from the camera. Whether you use it or not, once started, the camera will be always taking these samples at a given rate (you can specify that parameter as well, for instance, 15 frames per second for video streaming, 30 frames per second for still images).
You have one easy way to take a picture which implies only one method call, if the image is available you will get an image with the size, format and effects you specified. As mentioned before, the camera is always generating samples, you can also work on those “preliminary” pictures by specifying a call back, for instance, in the case you want to detect something quickly or using a display as every digital camera does.
One last thing I would like to mention here is that the Spresense contains a 2D graphic accelerator, this accelerator will allow you to do some fast transformations on images, such as resizing them. I hope you got the essential concepts for dealing with a camera from code, if you already knew that, you won’t be reading this line anyway.
The GNSS (GPS) and the RTC (Real time clock)
Hm... no, I am not mixing topics, there is a relationship between the global positioning system and the Real Time Clock.
MCU (Micro-controller) based boards are not like regular computers, one difference, among a lot more, is that they might not have a real time clock. Your laptop will remember the date and time each time you turn it on; that’s because it has a separated clock (fed by a separated battery) to keep track of the time, that is not the case for this kind of boards.
Without GPS, you should have an additional small card, the RTC and its battery. So, GPS will give you the time, the date, the coordinates but it also has some other interesting concepts that are quite useful in several scenarios.
Every GPS receiver has a compass on it but I guess not all the functionality is accessible through the Spresense SDKs. A compass will tell you the orientation of the device, in other words, where the device is pointing to. That is important for many reasons; for instance, it is important to calculate heading and bearing (where you are, and where you are willing to go) but the main reason compasses are important to BCB is because you can tell where a camera is looking at, so, if you are tracking an object and that object while moving from left to right vanished from the view angle of the camera you will be able to know if left means east or west.
GPS and Compasses are tricky things, they are subject of magnetization (if you place a receiver near a power cable or an antenna ... it could get pretty confused). There are other details I won’t cover in this work, it could never end if I do so. If you are going to do a heavy use of the GPS I suggest you to dig into the details, at least, to know what is going on behind the scene (for instance, what type of orientation is being used). I am focusing on static devices for now, so, concepts like Geo-fence are not important. But, if you are going to work with mobile things or tracking moving things, Geo-fence is a nice thing to look at because you will be able to set a kind of cage for the tracked object and you will be able to know if that object is out of the area it is supposed to be. The Spresense API will do the work easier to you, but it could be a good idea to understand the “language” used by GPS systems. I am not encouraging you to bypass the API and parse yourself the NMEA strings (NMEA stands for National Marine Electronic Association), but it is always great to know a little bit more, just in case. The same is true for AT commands, they are pretty useful to deal with WiFi, even LoRa cards can be configured using AT Commands. The message: be productive but also bear in mind that knowledge is a good investment.
DNNRT, Deep Neural Network run-time and the Storage System.
I mentioned many times that this board is able to run AI on batteries. The SONY Spresense comes with an "inferencing" engine (it is not meant to train networks, just to run them) that will simplify some uses of AI. There are a few limitations about the type of deep neural network you can run due to the amount of memory available on the device, that is something to keep in mind (specially if you are using the camera and the DNNRT at the same time, fortunately, there are some tricks to get around it).
Again, I am not mixing topics here. The AI capabilities of the Spresense depends on the Storage System (the microSD card in this case, so, the extension board). That is because you will have to use a model to run the inference and that model is ... a file! I will cover the procedures later, for now, there is not much to say about the Storage System, but... it is Japanese, it has to be perfect, you will have to use well formatted FAT32 cards or it will complain, sometimes I tried to use FAT32 cards but there were just a bit in the header that the Spresense didn’t like, fdisk will be your friend!
Running AI on the Spresense is extremely simple and at some point, amazing. What you will do is to specify a model, create an input variable, run the inference and you will have an output variable with the results. What is not easy at all is to define the network itself. The SONY Neural Network Console is a great tool, easy to use, and it will allow you to drag and drop layers at your will; the problem relies on DATA. If you are creating your own networks, take a deep look at the data-sets you are using, analyze the learning curve. If you play with images, sometimes, a couple of bad pictures can ruin the entire training process, the same is true for other types of data. These trained AI depends on statistics, it is a brute force machine, so, if your data-set contains the wrong data, the result will be the wrong one.
I had to prepare a demo on this features and I took it pretty seriously, so, I hired a model, a Kind of plastic Leopard Zoolander, I left the Spresense taking many pictures of it from all the angles (always using the Spresense camera to do it) and I used a contrasting background to make things easier, I did the same with a plastic monkey but, ... Big Cats are the stars here; even taking that amount of care some training processes had a pretty bad learning curve until I found a satisfactory one to achieve a high accuracy level, finally, it all went great and my Big Cat Zoolander was a star in the Silicon Valley, a happy ending story.
Sound, Sound, SONY and Sound
I will be honest. When I first met the Spresense I said, wow, it supports lots of microphones and it even has a headphone jack just in case you want to create your own Walkman. Well, that joke was pretty bad! I encountered a few scenarios where that jack is vital.
The sound recording capabilities on the Spresense will rely again on the Extension board, the extension board has dedicated pins for audio recording, this time, this dependency has an advantage, all other pins will be free for other usage.
In my case, I analyze audio on the fly but sometimes I need to record wav samples, so, the microSD card will be also needed for these cases.
You can reach a pretty nice range of db, specially if you are using analog microphones, now, make it sure your connections are clean, the resistances you use are the proper ones and that your soldering skills are better than mine, some noise can be introduced by many factors, so, be careful, if it sounds bad, it is your fault!
As the sound library is also built on the top of the DSP, you can do pretty weird things on this device like playing more than one track at the same time and things like that, I won’t regret this joke, I won’t ever need to use a karaoke like application for my devices. What we are going to explore in detail is the wav recording abilities (and further analysis) on the Spresense to detect sound signatures and sound sources, that is not an easy task, I will try to cover it as clearly as possible.
In the upcoming posts we will start designing and building a real device, like the one shown in the picture. We are going to play with fire, if we do things wrong something will end up broken or, pay attention, you could be harmed. The idea is to cover every topic you need to know to build a real edge device, that includes, solar power and batteries.
Playing with batteries is a dangerous thing, they can explode, leak fluids, burn boards and catch fire, so, as a general rule for all the upcoming things, follow the 80/20 rule I learnt from a friend: 80% preparation, 20% action.