In a previous article, I got quite satisfactory results using various machine learning regression algorithms in estimating the compressive strength values of concrete using 8 different parameters. I wrote a follow-up to this article and applied deep learning to the same data set and compared the performances.
In this article I am going to give the details about the steps involved in implementing a Machine Learning Regression Analysis on Streamlit, followed by deploying on AWS EC2.
Starting with definitions, Streamlit is an open-source Python library that makes it easy to generate and share beautiful, custom web apps for machine learning and data science. It is possible to build and deploy (locally) powerful data apps in just a few minutes.
Amazon Web Services (AWS) offers reliable, scalable, and inexpensive cloud computing services. The term cloud computing services is a broad category and it encompasses the IT resources provided over the internet. Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers.
I used a Kaggle dataset. As explained on the website, the first seven parameters are the addition to the concrete (units in kg in a m3 mixture and “Age” in days (1 to 365)). There are 8 quantitative input variables, and 1 quantitative output (Compressive Strength (in MPa) of the concrete); 1030 instances with no missing data.
Data Set
Pandas describe function gives basic statistical details of numerical columns. Minimum and maximum values will be important while getting input from the user in Streamlit.
I checked the correlation of all the parameters (columns) and noticed that cement has the highest positive correlation while water has the maximum negative correlation. Flyash has the minimum effect on the compressive strength of the concrete.
Machine Learning
In the original paper, I applied five different algorithms, but here I will use only XGBoost, which gave the highest r2-score value. Remember, the target here is to deploy the analysis by Streamlit on AWS EC2.
At the last step, I imported pickle and saved the XGBoost model that I used for the analysis. Python’s pickle module is very useful when you’re working with machine learning algorithms, where you want to save them to be able to make new predictions at a later time, without having to rewrite everything or train the model all over again.
Streamlit
Now that we have the model saved as a .pkl file, we can start preparing the streamlit part. You can use any word editor, but you must save the file with .py extension.
Below, you can see that first steps are importing the libraries, inserting a photo and a title to the app.
Next step will be getting the input values from the user. The left-most part of the screen will be reserved for the user to enter the values (all numeric). Using those values, a dictionary named my_dict will be formed and this dictionary will be converted to a Pandas DataFrame.
Final steps will be opening and reading the XGBoost model prepared and saved before in the main Python file. The model will make a prediction and display the predicted compressive strength value.
Next stage will be saving and running the code in Anaconda or a similar simple bash shell terminal. Follow the following steps:
Just make sure that all the files that are needed are in the same folder. In this case, you will need these files to run the app successfully:
· Streamlit file saved with a .py extension.
· The model saved as a pickle file.
· The photo (not always needed, I used it , so I need it).
Once you run the code, you will get the screen below:
Your app is running, but only locally. If you want to make your app accessible from other computers, AWS deployment may be a good alternative.
AWS EC2 Deployment
First, you must open an AWS account and sign into it. Then, you need to go to EC2 page and launch an instance as shown below:
You will need a basic computer for this application, so go for a free configuration and take the default setup until the 6th stage. Here add two rules (Custom TCP), label the Port Ranges 8501 and 8502, turn Sources to Anywhere (all marked in red).
Then go to “Review and Launch” step, check everything and launch the system. You have to use a key pair here. Save this file to the folder where all the streamlit files are located.
Follow the following steps:
· Open Git Bash in the above-mentioned file folder,
· Use the following command to reduce the permission level of your pem file from w to r:
· Connect to the instance by running the ssh commands (shown in red) on Git Bash:
Once you are connected, install Python 3.7 (or the version that you will be using) and git to EC2.
If you want to keep the app running on EC2, make a repo on GitHub and save the files there. Copy the https address of this repo.
Now, clone your repo to EC2:
To check the success of the cloning, ls and then get into that folder.
Next step will be to make a virtual environment and activating it:
Next, install the packages required by the Streamlit. In my case the following were installed:
Once you installed all the necessary packages, you can run the app:
You will get the following message in the terminal and you will be able to view your Streamlit app in your browser.
To keep the app running, you need to install tmux and generate a new tmux session and run the app again.
Even if you close the Git Bash, the app will continue to work so long the EC2 is running.
Conclusion
In this article I gave the details of the steps involved in implementing a Machine Learning Regression Analysis on Streamlit, followed by deploying on AWS EC2.