In a complete project, the steps to be performed before arriving at the extraction of the features are many, the main ones can be divided into four macro phases, each with criticalities to be recognized and solved in order to obtain a performing machine learning model.
- Dataset Analysis
- Preprocessing
- Features Extraction & Features Selection
- Normalization
Our goal is to try to extract features from a generical signal, so I won’t go through all the steps.
For this exemple I decided to choose from a public dataset an acquisition of an accelerometer used in the “Human Activity Recognition” experiment in which we want to determine the activity that a person is performing through the use of a mobile phone.
As a result of this piece of code you should get a dataframe with an accelerometer acquisition.
In this case we are not interested in seeing if there are outlayers, if the data is already normalized or if there is something wrong.
All these types of analysis must absolutely be done if the data we used were needed for a particular project, but since I only want to show how to extract features from a generic signal, I am not interested in evaluating the reliability, integrity and consistency of the data.
In this case it might be interesting to divide the signal into windows and extract the respective features for each window, but i prefer avoid doing too much at once.
The second part of the code intends to define a function that takes in input a list of values and returns a table with the features.
The features to be analyzed are divided into two types:
- Time-domain graph shows how a signal changes with time
- Frequency-domain graph shows how much of the signal lies within each given frequency band over a range of frequencies.
For frequency domain feature first you must obtain the FFT of the function and the corresponding power spectrum.
This function therefore allows to obtain a fair amount of features given in input a signal.
There are many other features that can be exploited and can easily be included in the script without having to modify too much.
Once the features for each analyzed signal have been obtained, it is also possible to normalize them so that a machine learning algorithm does not give too much weight to a particular one, it is also recommended to exploit feature selection algorithms such as PCA and PCC to reduce the number of features by obtaining a benefit on the computational load.
In this case, the output data structure is not even designed to be conveniently inserted in the classic ML algorithms.
The intent of this article was to define a very simple and immediate guide for those who are just starting out with the extraction of features from any type of signal (vibrations, acoustics, etc.), unfortunately many topics have remained not explained or not even considered.
If any part is not clear or needs a more detailed explanation, I ask you to report it to me, so I can try to review the concept by simplifying it or deepening it. Thank you