Utilizing Multiple Types of Features for Protein Classification!

January 12, 2021 by systems

Abhinav Angirekula

Just now·3 min read

Most of the current and standard approaches utilized for the endeavor of classifying proteins rely on recurrent neural networks being used on protein sequences; these networks then predict protein classification. However, this particular project was done with the intent of trying to see if we could use other easily attainable features to aid with prediction. The results suggest that there is plentiful room to improve.

Image for post

Image for post

Data used: https://www.kaggle.com/shahir/protein-data-set

Now then, how about we jump right into the code?

We begin by fooling around with the data a little bit.

Image for post

Image for post

Image for post

Image for post

Now let’s merge these two DataFrames!

Image for post

Image for post

Image for post — Dropping null values and only selecting protein data from “df”

Image for post — Dropping null values and only selecting protein data from “df”

Image for post

Image for post

Next, let’s merge aaallll of the data into one single feature!

Image for post

Image for post

Image for post

Image for post

Next, it’s time to prepare the data for our final model!

Image for post

Image for post

Remember to import the tools we’ll need.

Image for post

Image for post

And now, it’s time for what you’ve all been waiting for…. The actual model!

Image for post

Image for post

Time to train it!

Image for post

Image for post

Image for post

Image for post

Image for post

Image for post

Okay, now we need to figure out how well our model performs.

Image for post

Image for post

The grand finale! Our confusion matrix…

Image for post

Image for post

Image for post

Image for post

Welp. As you can see, our model struggles a lot. Still, I hope you guys learned something from this!

The full notebook can be found on my GitHub: https://github.com/AAbhi256

CNN keras and innvestigate

Explore and run machine learning code with Kaggle Notebooks | Using data from Structural Protein Sequences

www.kaggle.com

https://towardsdatascience.com/a-comprehensive-guide-to-correlational-neural-network-with-keras-3f7886028e4a

https://snap.stanford.edu/snappy/doc/reference/multimodal.html

https://openaccess.thecvf.com/content_cvpr_2017/papers/Chen_AMC_Attention_guided_CVPR_2017_paper.pdf