This tutorial will guide you through the process of building a machine learning model with SensiML that can monitor the operational state of a fan including whether the fan is on, what speed setting it's set at, and whether the fan is experiencing a fault condition (tapping or shaking). We'll cover the development process from end to end including data collection, model development, and model deployment using the SAMD21 Machine Learning Evaluation Kit. In addition to a step-by-step walkthrough on the usage of the ML development tools, this tutorial will provide high-level details on the data collection and model development process that can be applied to your own applications.
A fully developed fan condition monitoring project including dataset, pre-trained model, and firmware source code is provided with this guide to help you get your machine learning project up and running quickly.
- A fan of your choosing. For this example, we'll be using the Honeywell HT-900 table fan as shown in Figure 3.
- Standard mounting putty such as the Loctite Fun-Tak as shown in Figure 4.
- The firmware and MPLAB X project files can be found in GitHub repository that accompanies this demo.
- The dataset used in this tutorial can be downloaded from the latest GitHub release.
- IMU data collection firmware along with usage information for the Machine Learning Evaluation Kit can be downloaded from the SAMD21 ML Evaluation Kit Data Logger repository.
- Pre-built firmware files for the fan demo and for data collection using the same settings as covered in this guide can be downloaded from the latest GitHub release.
Before we get started, you'll need to install and set up the required software as detailed in the steps below.
Install the MPLAB X IDE and XC32 compiler. These are required to load the demo project and to program the SAMD21 board. You can use the default, free license for the XC32 compiler as you will not need any of the pro functionality here.
Sign up for a free community edition account with SensiML if you have not already. We'll use this to process our sensor data and generate the fan condition classifier library.
Download the SensiML Data Capture Lab from the SensiML Downloads page and install it. We'll use this to capture and label data for our SensiML project.
In this example, we're targeting a predictive maintenance type application. The question we are trying to answer here is whether we can use the analytic abilities of machine learning to monitor and predict machine failure thereby reducing maintenance costs and increasing uptime. To demonstrate how you can attack this type of application, we'll develop a classifier model using the SensiML Analytics Toolkit that can recognize the state of a Honeywell HT-900 fan. The model is deployed on a SAMD21 Machine Learning Evaluation Kit mounted to the fan housing and can classify between the three-speed modes of the HT-900 fan as well as two disturbance states: tapping and shaking. The example application setup is pictured in Figure 5.
Data Collection Overview
Now let's cover how we should collect the data samples that will be used to develop the fan state classifier model.
For an in-depth guide on the data collection process in general, refer to the "Sensor Data Collection" section (page 29) of SensiML's "Building Smart IoT Devices with AutoML" whitepaper.
The first step in the data collection process is to determine an appropriate sensor configuration for your application; this includes the geometric placement of the sensor, the installation method, and the signal processing parameters like sample rate and sensitivity.
To affix the SAMD21 board to the fan, a standard mounting putty was used; this is the same type as is used to mount lightweight items to a wall. Mounting putty has the following desirable properties for this application:
It is relatively rigid, so transfers vibrations well (some adhesive mounting squares for example can be quite soft).
It can maintain its hold for extended periods of time.
It can be easily installed and is not permanent.
In terms of placement, the board was installed in its natural orientation (i.e. the accelerometer should nominally read X=0 Y=0 Z=1g) with the back of the board being attached to the topmost area of the housing. There is no particular reason this placement was chosen other than it is the easiest way to install the board. A short demonstration of the sensor installation process is shown in Figure 7.
The downside of this installation configuration compared to a more permanent fixture is that there will be a higher degree of variability between installation instances and over time, especially if exposed to varying temperature and humidity. However, this configuration should be good enough for prototyping on most fan types.
Sensor Sampling Configuration
The sensor sampling parameters are summarized below:
- Sensor: 3-axis Accelerometer + 3-axis Gyrometer
- Sample Rate / Frequency Range: 400 Hz with 4x oversampling (~40 Hz 3 dB cutoff)
- Accelerometer Full Scale Range: +/-2 G (most sensitive setting)
- Gyrometer Full Scale Range: +/-125 DPS (most sensitive setting)
Note: This particular configuration (400 Hz and 4x oversampling) is only supported for the BMI160 sensor.
Let's discuss some of the rationales behind these choices.
First off, both the accelerometer and gyrometer give useful information about the fan's vibrations. Rather than try and eliminate the least informative axes, we'll rely on SensiML's AutoML process to select the best axes for the task.
Secondly, the range parameters were chosen to maximize the use of the available digital range from the IMU sensor (16-bit signed samples [-32768,32767]). Fan vibration signals captured during the activities of interest (shaking, tapping, fan on) were found to be well below the full-scale range in amplitude, so the most sensitive setting was chosen for both the accelerometer and gyrometer.
Finally, the sampling rate was chosen to capture the dominant low-frequency mode of vibration from the fan as captured by the mounted IMU. To determine the frequencies of interest, data was captured using the SAMD21 data logger firmware with the BMI160's maximum sampling rate selected (1600 Hz). The HT-900 fan was set to its maximum speed and several seconds of the fan running undisturbed were captured. A power spectrum plot reproduced in the figure below, revealed a few dominant frequency components: one at 120 Hz, two at ~90 Hz, and one at ~30 Hz.
The analysis was repeated at different fan speeds and yielded the following insights:
- The 120 Hz component is a byproduct of the power supply and does not vary with fan speed.
- The components centered around 90 Hz in the plot above were shown to merge into one component when the fan was slowed down; this is an indication of non-linear behavior and while this information could be useful in the model development, it could also lead to overfitting and generally a more complex model.
- The component at around 30 Hz seemed to be the most reliable indicator of vibration behavior, as it responded linearly to the variation in fan speed.
Looking at the Bosch BMI160 manual, the 100 Hz setting satisfies the Nyquist sample rate requirement for the ~30 Hz we observed. It also has a low pass filter cut off of ~40 Hz so it should eliminate the frequency components we want to ignore. However, further testing showed this setting produced somewhat noisy data (possibly because the activity is so close to the filter cutoff frequency) so the IMU chips onboard oversampling functionality was leveraged (at 4x oversampling) to provide a cleaner signal.
Keep in mind that the conclusions above were derived from analysis on a specific fan, so while you can apply similar principles to find a sensible sensor configuration in your own application, the specific sensor configuration derived here probably won't be optimal if you are using a different fan setup.
Data Collection Protocol
The next step in the data collection process is putting together a protocol to use when collecting your data. This includes deciding how many samples to collect, what metadata parameters to collect, and other parameters that determine the procedure by which data is collected.
Data Collection: Metadata
Let's cover metadata first as this determines how we contextualize our data. The metadata variables determined for this example application are summarized in the table below.
|Fan ID||Tag for the make, model, and serial number identifier of the fan being used.|
|Environment ID||Tag for the specific environment that the data was captured in.|
|Mount ID||Tag for the installation instance of the sensor.|
|Collection ID||Tag for the data collection effort where multiple samples were captured.|
In addition, the following requirements were placed on the metadata:
- Environment ID
- The description for this tag should include
- A description of the support the fan was stood on (e.g. work desktop), including the surface material (e.g. wood).
- Any possible background vibration sources that could be picked up by the sensor.
- A short description of the environment (e.g. conference room).
- The description for this tag should include
- Mount ID
- The description for this tag should include
- The make and model of the mounting putty used
- A picture of the installation with detail on the attachment between the fan housing and the sensor board.
- A new mount ID should be created for all mounting instances, even if the board is just being re-mounted.
- The description for this tag should include
In the exploration phase of this application, a few different fans were tested in different mounting configurations and environments, and the above variables were chosen as they were the primary factors that determine the differences between readings.
Data Collection: Sampling Methodology
At this point, we need to decide how to sample data for our application; this includes choosing how many samples to capture and defining what steps are needed to take the measurements. The methodology for this example application is summarized in the steps below:
- Record the metadata values for this data collection in a log. Alternatively, these can be stored directly as metadata variables in SensiML Data Capture Lab.
- Record and label segments 30 seconds each of fan off, speed 1, speed 2, and speed 3
- Record tapping on the fan housing in the same spot for at most 15 seconds at a time; repeat this until you have 30 seconds of tapping data.
- With fan speed set at 1 (slowest setting), record sensor data as you gently shake the table by grabbing either the tabletop or one of the table legs and gently rocking back and forth for at most 15 seconds; repeat until you have 30 seconds of labeled shaking data.
This process was designed to be very simple to (1) constrain the problem by limiting the number of variables in the experiment and (2) limit the amount of time and effort that goes into a single data collection run. Keeping the data collection simple and small is crucial during initial model development to prove out your initial hypothesis and to work out kinks in the data collection process.
A single run of this process should generate enough data (and enough variation) to create a simple machine learning model that should work well under the constrained conditions of your experiment. To develop a more generalized model, you might perform the above collection with several different fan types or by introducing vibration interference that is realistic for your application.
Data Collection: Data Capture Tools
To record and label IMU data from the evaluation kit for use in the SensiML Analytics Studio, it is simplest to stream data to the SensiML Data Capture Lab directly. Follow the steps below to connect the SAMD21 board directly to Data Capture Lab:
Head to the SAMD21 ML Evaluation Kit Data Logger repository to download the data collection firmware.
Follow the "How to Configure, Compile, and Flash" instructions in the README document located in the repository to compile the data collection firmware for your desired sensor configuration that uses the SensiML output format.
Once you've flashed the data collection firmware, open up Data Capture Lab and create a new project.
Follow the "Usage with the SensiML Data Capture Lab" instructions in the data logger repository README document to connect the SAMD21 board to Data Capture Lab.
Alternatively, you can use the MPLAB Machine Learning plugin to collect data with MPLAB Data Visualizer and export it to the Data Capture Lab. Refer to the "Using the ML Plugin with SensiML" guide for more information on that process.
For more details on recording and labeling with the Data Capture Lab please visit the SensiML documentation page.
At this point, you should have an initial dataset collected from your application. Let's now move into the Analytics Studio to generate our classifier model.
If you don't have your own data yet but still want to evaluate the model development process, download the dataset that was used in developing this guide from the releases page and import it into your project to follow along.
Open up the Analytics Studio in your web browser and log in.
Navigate over to the Prepare Data tab to create the query that will be used to train your machine learning model. Fill out the fields as shown in Figure 10; these query parameters will select only the samples in the training fold.
The SensiML Query determines what data from our dataset will be selected for training. We can use this to exclude test data from our training process.
Switch over to the Build Model tab to start developing the machine learning model pipeline.
In the basic settings, select the Query that was created in the last step, set Optimization Metric as f1-score, and set the Window Size to 200 samples.
We choose the f1-score to account for the class imbalance that is present in the example dataset.
We choose a Window Size of 200 (i.e., 0.5 seconds @ the 400 Hz IMU sample rate) here since this should be enough to detect changes in the vibration behavior we're interested in.
Fill out the advanced settings as shown in Figure 12.
Among the more important settings here is the Strip Mean setting - this ensures that the model removes the bias in the IMU readings, which can vary among individual sensors and installation instances.
Also, note that the use of the Custom Feature Generator set. It is used to expand the AutoML search space to several classes beyond the basic features that could be useful for this application. Details about the SensiML feature generators can be found in the Analytics Studio documentation here.
Finally, the # of Folds used for validation has been reduced to 3 since the example dataset is small and we want to ensure that each fold has enough data to provide an accurate estimate of model performance.
Once you've entered the pipeline settings, click the Optimize button. This step will use AutoML techniques to automatically select the best features and machine learning algorithms for the classification task given your input data. This process will usually take several minutes.
When the process is completed, take a moment to explore the models that were generated and verify they have good performance. Note that the rank 0 model is usually the best compromise among all the generated candidate models.
If the candidate models have poor accuracy or seem unnecessarily complex, chances are you need to go back and check your dataset. You might consider starting with a smaller dataset or even limiting the number of classes during initial development to suss out where your modeling is failing.
Once the Build Model optimization step is completed, navigate to the Download Model tab. Fill out the Knowledge Pack settings using the Pipeline, Model, and Data Source you created in the previous steps, and select Library as the output format (see Figure 13 for reference). Click the Download button to deploy your model.
The Library format, available to all SensiML subscription tiers, will generate a pre-compiled library for the generated machine learning model, along with a header file defining the user API.
You now have a compiled library for the SAMD21 containing your machine learning model that you can integrate into your own MPLAB X project. For more detailed information on the Analytics Studio, head over to SensiML's documentation page.
Knowledge Pack Integration
Let's take our SensiML library (i.e., knowledge pack) and integrate it into an existing MPLAB X project using the fan condition monitoring demo project as a template.
Use the MPLAB X project that accompanies this guide as a starting point for your project. This will save you the trouble of doing the hardware and project configuration yourself.
Download or clone the demo source code from the GitHub repository. In addition to the demo source code, this repository contains the MPLAB X project pre-configured for using a SensiML knowledge pack library.
Unzip the contents of the SensiML knowledge pack (the ZIP archive downloaded in the previous section) into the same root folder your MPLAB X project is located so that it overwrites the existing knowledgepack folder.
Open up the samd21-iot-sensiml-fan-condition-demo.X project in the MPLAB X IDE.
In MPLAB X, open up the app_config.h under Header Files and change the sensor parameters to match those that were used in the development of your model. If these parameters don't match what was used during data collection, the model will produce unexpected results.
Now open up main.c file under Source Files.
Scroll down a bit further down inside the main while loop until you reach the section as shown in Figure 14 that begins with a call to ringbuffer_get_read_buffer. This is the essence of the inferencing code: it simply calls into the SensiML knowledge pack via the sml_recognition_run function for every sample we get from the IMU.
If you're creating an application with different classes, make modifications to the LED code here to reflect your class mapping.
The sml_recognition_run function is the main entry point into the SensiML SDK; it internally buffers the samples we give it and makes an inference when it has enough data. For the project in this guide, an inference will be made every 200 samples - this corresponds to the Window Size parameter we defined in the Query step of the model development in Analytics Studio. Note that sml_recognition_run will return a negative integer until it has enough data to make a prediction.
Fan Condition Monitor Firmware Overview
For a description of the demo firmware included with this project including operation, usage, and benchmarks see the README file in the GitHub repository.
That's it! You now have a basic understanding of how to develop a fan condition monitoring application with SensiML and the SAMD21 ML evaluation kit.
For an in-depth guide on the data-driven design process see SensiML's "Building Smart IoT Devices with AutoML" whitepaper.
To learn more about the SensiML Analytics Toolkit, including tutorials for other machine learning applications, go to the SensiML "Getting Started" page.
Table of Contents