Let’s dive into an example, below example is from one my project in which I used VADER for analyzing notes from a Mass Mobilization Project that provides data for protests from all around the world.
Step 1: Create corpus from text column
corpus=list(df[‘notes’])
Step 2 : Insatiate Sentiment Analyzer
#instantiate SIA
sia=SentimentIntensityAnalyzer()#Get polarity scores for the first titlesia.polarity_scores(corpus[0])
Step 3: Create a data frame using each document in your corpus
for text in corpus:scores = sia.polarity_scores(text)scores['text'] = textdicts.append(scores)df_new = pd.DataFrame(dicts)df_new.head()
And you get below data frame :
Lets sort data frame by positive or negative scores and see the results:
df_new.sort_values(by='pos', ascending=False).head(10)
df_new.sort_values(by='neg', ascending=False).head(10)
Protesterviolence and Stateresponse are two target variables used for EDA.
Below is average positive , negative and compound score against protester violence.
df_new.groupby('protesterviolence').mean()[['pos', 'neg', 'compound']].plot(kind='barh',title='Average Positive, Negative & Compound Scores for Protester Violence',figsize=(15,10),color=(['blue','orange','red']))
We see average maximum compound score in red leaning towards negative sentiment for protests with protester violence (protester violence = 1)
Below is average positive , negative and compound score against stateresponse.
df_new.groupby('stateresponse').mean()[['pos', 'neg', 'compound']].plot(kind='barh',title='Average Positive, Negative & Compound Scores for State response',figsize=(15,10),color=(['blue','orange','red']))
Here we see maximum negative compound score in red for negative state response like shootings , killings , beatings , arrests etc.
Conclusion:
· Based on the above analysis VADER is a very strong sentiment analyzer tool for texts.
· This data frame and positivity , negativity scores can be used further into modeling.
Resources :
https://www.geeksforgeeks.org/python-sentiment-analysis-using-vader/