This tutorial will guide you through the process of building a gesture classifier with SensiML and deploying it to the Microchip SAMD21 Machine Learning Evaluation Kit. We'll also provide some guidance on what factors you should consider when designing your data collection process and solutions to common issues you may encounter in developing your application. A fully developed gesture classifier project including a dataset, pre-trained machine learning model, and firmware source code to help you get up and running quickly with your project using SensiML and the SAMD21 ML Evaluation Kit are provided with this guide.
- The firmware and MPLAB X project files can be found in the GitHub repository.
- The dataset used in this tutorial can be downloaded from the latest GitHub release.
- Pre-built firmware files for gesture recognition and data collection can be downloaded from the latest GitHub release.
Before we get started, you'll need to install and set up the required software as detailed in the steps below.
Install the MPLAB X IDE and XC32 compiler. These are required to load the gesture recognition project and to program the SAMD21 board. You can use the default, free license for the XC32 compiler as we will not need any of the pro functionality here.
Sign up for a free community edition account with SensiML if you have not already. We'll use this to process our sensor data and generate the gesture classifier library.
Download the SensiML Data Capture Lab from the SensiML Downloads page and install it. We'll use this to import data into our SensiML project.
Finally, head over to the GitHub releases page for this project and download the ml-samd21-iot-sensiml-gestures-demo.zip archive, which contains the dataset and pre-built firmware binaries for this guide.
Flashing the Gesture Classifier Demo Firmware
We are now set up to run the pre-built firmware. Go ahead and program your device with the firmware HEX file from the latest GitHub release using the following steps.
Plug your SAMD21 evaluation kit into your PC via USB. The SAMD21 should automatically come up as a USB Flash drive.
Open the ml-samd21-iot-sensiml-gestures-demo.zip archive downloaded previously and locate the gesture classifier demo HEX file corresponding to your sensor make:
- Bosch IMU: binaries/samd21-iot-sensiml-gestures-demo_bmi160.hex
- TDK IMU: binaries/samd21-iot-sensiml-gestures-demo_icm42688.hex
Drag and drop the HEX file onto the SAMD21 USB drive to program the device.
Gesture Classifier Firmware Overview
For a description of the demo firmware included with this project including operation, usage, and benchmarks see the "README" section in the GitHub repository.
Data Collection Overview
Before we jump into collecting data samples, we should put some consideration into the design of our data collection process; after all, the data that we collect will ultimately determine the kind of performance we can expect to achieve with our Machine Learning model.
For an in-depth guide on the data collection process in general, refer to the "Sensor Data Collection" section (pg 29) of SensiML's Building Smart IoT Devices with AutoML whitepaper.
Data Collection: Sensor Configuration
The first step in the data collection process is to determine the best sensor configuration for your application; this includes both the physical placement and installation of the sensor as well as signal processing parameters like sample rate and sensitivity.
Most likely, many of your design parameters for sensor configuration are fixed (due to e.g. a fixed board design, shared sensor usage, etc.), but it is worth considering whether the application design is optimal for your machine learning task and if some design parameters should be changed. The question you should be asking at this point in the design is this: can I reasonably expect an algorithm to predict the desired output given the sensor data input? Data exploration (e.g. visualization) will help here to generate good initial hypotheses, as well as a good working knowledge of the signal domain (i.e. understanding the physical processes at work).
Here are a few specific questions we might ask during the sensor configuration stage and some possible answers:
- How should the sensor sampling parameters be configured? (i.e., sample rate, sensitivity/input range, etc.)
- Choose a sensor configuration that captures the events of interest in a reasonably compact representation, with a good signal-to-interference ratio.
- How should the sensor be placed? (i.e., mounting and orientation)
- Choose a placement that will minimize the susceptibility to interference (such as vibrations from an engine).
- How should the sensor be fixed?
- Choose a method that will ensure the consistency between readings over time and across different sensor deployments.
The main sensor configuration parameters chosen for this project and the justification behind their choices are as follows:
- Accelerometer only
- Chosen gestures should be mostly invariant to the device rotations
- 100 Hz Sample Rate
- Chosen gestures have a frequency range of typically < 5 Hz (i.e., 10 Hz Nyquist rate), but 100 Hz was chosen for flexibility in the data collection process
- 16 G accelerometer range
- Least sensitive setting since we're not interested in micro-movements
Data Collection: Collection Protocol
The next step in the data collection process is putting together a protocol to use when collecting your data.
Roughly speaking, we want to achieve three things with the protocol:
A reproducible methodology for performing data collection
A reproducible methodology ensures that the data collection process is performed in a prescribed manner, with minimal variations between measurements, and ensures the integrity of our data.
Sampling parameters that will ensure we have a sufficient number of samples for development, and enough diversity (i.e., coverage) to enable our end model to generalize well
A good rule of thumb is that you need at least tens of samples for each class of event you want to classify (30 is a good starting point); however, this number may increase depending on the variance between the samples. Taking the gestures application as an example, if you wanted to detect a circle gesture, but wanted your model to be invariant to the size or speed of the circle gesture, you would need many more samples to cover the range of performances.
Another thing to consider when selecting a sample size is that you will invariably capture noise (i.e., unintended variances) in your samples; the hope is that with enough samples, the training algorithm will have enough information to learn to discriminate between the signal of interest and the noise.
A word to the wise: start small! Anticipate that the development of your data collection process will require some iteration; refine your process first, then start scaling up.
A set of metadata variables to be captured during the collection process that can be used to explain the known variances between samples
Metadata variables (or tags) are the breadcrumbs you leave yourself to trace your data samples once they're joined into a larger sample pool; among other things, these tags can be used to explore subgroups within your data (e.g., all gestures performed by a single test subject) and to track down any data issues you might uncover later (e.g., hardware problems, outlier samples, etc.).
For this demo project, we created a data protocol document that specified how gestures should be performed and what metadata should be collected along with it. To illustrate, below are the directives that constrained how the test subject would perform the gestures for collection. The text in italics defines the fixed experimental parameters for which we explicitly control.
- Subject should perform gestures that follow the specified trajectory description (e.g., clockwise wheel)
- Subject should perform gestures smoothly, in a way that feels natural to them
- Subject should perform gesture continuously for at least ten seconds
- Subject should be standing
- Subject should use dominant hand
- Subject should hold the board with a thumb and forefinger grip with the cord facing down as shown in Figure 2.
In addition, the following metadata values were logged for each data collection.
- Date of capture
- SAMD21 Test board ID
- Test environment ID
- Test subject ID
- (For idle class data only) Placement and orientation of SAMD21 board
Data Collection: Post-processing
Finally, all data samples were post-processed to form the final dataset.
- Data was split into exactly ten-second samples
- Samples were formatted as CSV files with the following naming convention:
- Samples were split into folds with 80% being allocated to development and 20% to testing
- Split was stratified so that the proportion of samples per class and per subject ID was the same for the development and test sets.
Data Collection: Data Capture Tools
For this guide, we'll be using the pre-built dataset included with the gestures demo, but to build your dataset you can use the MPLAB X Data Visualizer and Machine Learning plugins. These plugins can be used in tandem to capture samples and export them as a CSV or DCLI file that can be easily imported into SensiML's Data Capture Lab.
To use the ML Evaluation Kit with MPLAB Data Visualizer, you'll need to use the data logger firmware maintained on the "SAMD21 ML Evaluation Kit Data Logger" page. For convenience, pre-built binaries for the sensor configuration used in this project have been packaged in the ml-samd21-iot-sensiml-gestures-demo.zip archive included in the latest release:
- Bosch IMU: binaries/samd21-iot-data-visualizer_bmi160_100hz-axayzgxgygz-16g-2000dps.hex
- TDK IMU: binaries/samd21-iot-data-visualizer_icm42688_100hz-axayzgxgygz-16g-2000dps.hex
Refer to the "Using the ML Partners Plugin with SensiML" guide for more information on the data capture process.
Data Import with Data Capture Lab
Let's move on to importing our data into a new SensiML project.
Extract the ml-samd21-iot-sensiml-gestures-demo.zip archive containing the gestures dataset into a working directory.
Open up the SensiML Data Capture Lab tool and create a new project for this guide.
In the resulting dialog box, navigate to the folder where you previously extracted the ml-samd21-iot-sensiml-gestures-demo.zip archive and open the DCLI file located at dataset/train/train.dcli. Step through the resulting import prompts leaving all settings at their default until you reach the Select a Device Plugin window.
After selecting the device plugin, the Plugin Details page will appear; click Next to move forward to the Sensor Properties page. On the properties page, fill out the fields to match the configuration shown in Figure 5 (or select the ICM sensor if you are using the TDK IMU), then click Next.
Repeat steps three and four to import the test samples (dataset/test/test.dcli); this is the data that will be used to validate the model. When prompted, use the same sensor configuration that we created in the previous step.
At this point, our project is set up with the data we need and we can move on to the model development stage.
Let's now move into the Analytics Studio to generate our classifier model.
Open up the Analytics Studio in your web browser and log in.
Navigate over to the Prepare Data tab to create the query that will be used to train your machine learning model. Fill out the fields as shown in Figure 9; these query parameters will select only the samples in the training fold, and only use the accelerometer axes.
The SensiML Query determines what data from our dataset will be selected for training. We can use this to exclude samples (e.g., our test samples) or exclude data axes (e.g., gyrometer axes).
Switch over to the Build Model tab to start developing the machine learning model. Fill out the fields as shown in Figure 10. Note that the only settings that need to be changed from their defaults are the Query (created in the last step), the Optimization Metric (f1-score), and the Window Size (200 samples).
Due to the imbalance in the gesture dataset's class distribution, choosing the accuracy optimization metric here would bias the model optimization towards the classes with more samples; hence we opt for the f1-score to provide a better representative measure of model performance.
We choose a Window Size of 200 (i.e., two seconds at the 100 Hz IMU sample rate) here since that will be long enough to cover at least one cycle of the gestures we're interested in.
Once you've entered the pipeline settings, click the Optimize button. This step will use AutoML techniques to automatically select the best features and machine learning algorithm for the gesture classification task given your input data. This process will usually take several minutes.
Once the Build Model optimization step is completed, navigate to the Test Model tab.
Select the pipeline that we created in the previous step.
Select one of the models generated in the previous step, usually, the rank 0 model is the best compromise among all the generated candidate models.
Select the upside-down triangle icon in the Fold column and select test to filter the data so only the test samples are selected.
Click the ellipsis (…) located at the left-most column of the table and select Select All to include all test samples.
Click Compute Summary to generate the confusion matrix for the test samples. This should take a few minutes; once completed you will be presented with a table like is shown in Figure 12 summarizing the classification results.
The Confusion Matrix plots the classification results for the true labels (rows) versus the predicted labels (columns). The right-most column shows the Sensitivity (or Recall) score (true positive predictions / total true positives) for each class, and the bottom-most row shows the Precision score (true positive predictions / total positive predictions).
Finally, navigate to the Download Model tab to deploy your model. Fill out the Knowledge Pack settings using the Pipeline, Model, and Data Source you created in the previous steps, and select the Library output format (see Figure 13 for reference) then click the Download button.
The Library format, available to all SensiML subscription tiers, will generate a pre-compiled library for the generated machine learning model, along with a header file defining the user API.
You now have a compiled library for the SAMD21 containing your machine learning model that you can integrate into your project. For more detailed information on the Analytics Studio, head over to SensiML's documentation page.
Knowledge Pack Integration
Let's take our SensiML library (i.e., knowledge pack) and integrate it into an existing MPLAB X project using the gestures demo project as a template.
Use the MPLAB X project that accompanies this guide as a starting point for your project. This will save you the trouble of doing the hardware and project configuration yourself.
Download the gesture demo source code from the GitHub repository or clone the repository using git clone https://github.com/MicrochipTech/ml-samd21-iot-sensiml-gestures-demo/. In addition to the demo source code, this repository contains the MPLAB X project pre-configured for using a SensiML knowledge pack.
Unzip the contents of the SensiML knowledge pack (the ZIP archive downloaded in the previous section) into the same root folder your MPLAB X project is located so that it overwrites the existing knowledgepack folder.
Navigate to the knowledgepack/knowledgepack_project folder in the unzipped knowledge pack and locate app_config.h; move this file to the firmware's src directory (same root folder as the .X project) to replace the existing app_config.h; this will ensure that your application's sensor configuration matches the one used in the development of the model.
Open up the samd21-iot-sensiml-gestures-demo.X project in the MPLAB X IDE.
In MPLAB X, open up the main.c file under Source Files.
Scroll down to where the class_map variable is defined (see Figure 14 for reference). Modify the class_map strings to match up with the class mapping that was displayed in the Download Model step of the Analytics Studio. Note that the "UNK" class (integer 0) is reserved by SensiML, so this mapping won't change.
Scroll down a bit further down inside the main while loop until you reach the section as shown in Figure 15 that begins with a call to buffer_get_read_buffer. This is the fundamental essence of the code: it calls into the SensiML knowledge pack via the kb_run_model function for every sample we get from the IMU, and calls kb_reset_model whenever an inference was successfully made.
Make modifications to the LED code here to reflect your class mapping.
The kb_run_model function is the main entry point into the SensiML SDK; it internally buffers the samples we give it and makes an inference when it has enough data. For the project in this guide, an inference will be made every 200 samples - this corresponds to the Window Size parameter we defined in the Query step of the model development in Analytics Studio. The kb_run_model will return a negative integer until it has enough data to run inference.
That's it! You now have a basic understanding of how to develop a gesture-recognition application with SensiML and the SAMD21 ML evaluation kit.
For an in-depth guide on the data-driven design process see SensiML's "Building Smart IoT Devices with AutoML" whitepaper.
To learn more about the SensiML Toolkit, including tutorials for other machine learning applications, go to the SensiML "Getting Started" page.
Table of Contents