The lockdown brought out many skills in people; a few turned towards cooking, a few to musical instruments, and a few tried to grow plants. Our household belongs to the last cohort. While growing up, we had the usual holy basil, aloe, and an occasional marigold growing in the pots but now was the time to expand the repertoire. Thus began the hunt on amazon and local nurseries for potting soil, pots, and other paraphernalia.
Mine is a massive country in size and diverse not only in cultures, people, and food but also in flora-fauna and types of soil. Depending on the location from where the product is shipped, there can be differences especially when the product is soil. There are alluvial, clay, red, laterite, black, peaty and many more types of soils in the country. It goes without saying that the chemical and physical properties of different soils are different and only certain types of plants are suitable for each soil.
Long story short, my mother was planting everything in each type of soil and supplementing some kitchen waste fertiliser but the growth of the saplings was either slow or the plants would get some spots on them with droopy leaves. Also, pigeons — the flying rats — are menacing in any large city. There were three things that my mother could do:
1. Identify the soil type and plant only relevant seeds in it.
2. Identify any diseases early on and take corrective measure.
3. Take care of the pigeons (Evil laugh) 😈
She was spending hours on YouTube trying to find ways to take care of certain plants and truth be spoken, finding info on the internet can be quite overwhelming. I thought of helping her but then my knowledge of botany is laughable, so I decided to summon machine learning.
Dataset
The first issue was to find the data for the soils. As I mentioned, there are too many types of soils given the different climate and geographies that exist in the country. Just when I was thinking of moving to more numerical analysis based on the characteristics of the soil, I found a soil dataset. It had images for alluvial, clay, red, and black soils. It seems good enough.
Perfect! Let’s start the analysis.
Modelling
After building a few Convolutional nets, I have created a template that I use to set a baseline for the models. Many times one requires a small tweak in hyperparameters and gets decent results.
I use the split-folders package to divide the data(containing subfolders) into the train, test, and validation sets. After defining the train, test, and validation data, I formed a simple CNN model with a series of Convolutional and Maxpooling layers with dropouts in between. The later layers of the model consisted of Flatten and Dense with softmax activation.
I could see in the epochs that I am getting respectable validation accuracy. Let’s plot the accuracy and loss across the epochs.
That looks fine! Now I’ll evaluate the model on the test data and plot a confusion matrix to check the misclassification. For a good model, most of the observations should lie along the diagonal.
It seems we are fine with the first part of the problem.
Dataset
As usual, I didn’t have the dataset, so I resorted to the internet and a quick search lead me to Plant Village Disease Classification Challenge which has image datasets for 38 different classes. It is true that many of the plants in the dataset aren’t growing in my home but at least something is better than nothing.
The dataset had around 100k images and running it on my humble machine would have been overkill, so I decided to use Kaggle’s GPU capabilities. Also, the data is present on Kaggle thus it would be much simpler to import it into the models. There were many kernels present for this data but I found it a little weird that most people used training and validation data to train the model and for the testing purpose, they used the ‘already consumed’ validation data. I saw a fundamental problem here and decided to build my own model.
Before starting, I split the train data into train and validation sets; There was a test set already, so I was set with the three required datasets — train, test, and validation
Modelling
Before starting to model, I remembered that my mother will have to use these models in real life and most of the time Keras’ models are heavy ~300 MB which increases the slug size of Heroku app. It is an issue which should be mitigated in the modelling process itself.
I decided to use MobileNet which would be really small in size after compilation and training.
The accuracy and the loss across the epochs looked promising.
I was a little confident that the predictions on the test set will not be really bad and I was right.
You can see that the diagonal of the matrix consists of heavy numbers and the F1 scores (harmonic mean of precision and recall) are close to 0.9 for most of the classes, I think it is decent enough model.
The primary user of the model would be my mother and I am not expecting her to fire a laptop and use the model, so I had to make things a little easier.
Enters Streamlit!
Before I develop a UI, I know that I have 2 models — one for soil classification and other for plant disease. The second one is quite small in size (12 MB) but the former(300 MB) needs to be converted for an IoT device compatibility. Thankfully, TensorFlow lets us do so.
Now we are ready for deploying the app on Streamlit.
Deployment
Streamlit apps are quite easy to build, all I had to do was create an app.py, setup.sh, requirements.txt, Procfile, models and other things that I was using in a single folder. Please take a look at the GitHub repo here.
If you have all the files in your GitHub repo then you can easily use Streamlit Sharing Service(free), specify your app.py and if everything goes well, you will have an app that you can access from anywhere(Don’t know about autocratic countries).
Please check the app deployed here — https://share.streamlit.io/prashantmdgl9/soil-analysis/main/app.py
or at https://soil-analyser.herokuapp.com/
Resources
Compiling a list of resources for easy access.
- The soil dataset is present here.
- The diseased leaf dataset is present here.
- The deployed app can be accessed here(Streamlit) and here(Heroku).
- The GitHub for the deployed app is here(Streamlit) and here(Heroku).
- The code for training the soil model is present here(GitHub).
- The code for training the leaf model is present here(Kaggle).
- Her phone takes high-resolution photos and whenever she was taking the photos to upload them on the app, the app would crash. I missed it because the test images were quite small in size. I could mitigate the issue by setting a higher recursion limit.
sys.setrecursionlimit(10000)
2. She liked the fact that the app provides the suggestion that what type of plants she should grow in the soil.
3. She wasn’t thrilled with the results of the plant diseases’ model as according to it, almost all of her plants needed operation theatre and I understand that perfectly.
I had this faulty assumption that diseased leaves might look somewhat alike. It seems quite far fetched to assume for there can be leaves that have characteristic spots on them.
There is an acute need to update this dataset with the plants that she grows. For now, she can take care of small tomato plants with the help of this app. If the leaf is green, it would be classified healthy and in many of the cases few cases of scab, rust, blight, mildew will be correctly classified. For those species that have natural spots, yellow lines, brown spots on them — I need to update the data over time.
For now, if even a few plants gets the right treatment because of this effort then “Not Bad”!