A quick comparison of the Brazilian and Portuguese constitutions through Text Mining in R
This is a quick and mostly visual comparison of some features of the current Brazilian and Portuguese Constitutions intended to leverage some Text Mining concepts and to look at some similarities and differences of the Constitutions.
What you’ll see here:
- Text Mining
- Most frequent words
- Word clouds
For this analysis, I used Text Mining concepts and packages in R, and the steps in a nutshell are:
- Importing the text of the Constitution
- Removing stopwords, i.e. classes of words like articles, prepositions that don’t really add much value to the text
- Tokenizing, i.e. reducing the phrases down to words
After that, we get a dataframe that looks like this:
Now we are ready to construct some basic plots for each one of the documents.
Brazilian Constitution of 1988
In a country hardened by years of military dictatorship, changes in leadership and somewhat freshened by the recent movement for direct elections “Diretas Já”, there was a clear need for a less authoritarian Constitution, one that, at least on paper, would give the power back to the people.
And that’s exactly what the Brazilian Constitution of 1988 tried to accomplish. Despite being criticized for being lengthy and overly analytical, the current Federal Constitution was enacted in 1988 and it was definitely a milestone in the country’s democratic history.
After removing stopwords we are left with:
Total words: 25042
Total unique words: 4700
% of unique words: 18,76
The top 10 most common words in the Brazilian Constitution are:
Some quick notes:
- “federal”, “nacional” and “união” represent the upper position in the hierarchy of the federal government.
- “estados” and “municípios” represent the other two branches of government, the states and the municipalities.
- It’s interesting to note that the Portuguese equivalent of values and concepts such as social, justice, resources and people don’t appear until the 12th position.
With the help of the package worcloud2, we construct a word cloud under the beautiful colors of the Brazilian flag and with the silhouette of Brazil’s map:
As in any word cloud, the bigger the word, the more common it is. The color in this case is random.
Portuguese Constitution of 1976
After a leftist coup, in 1976, the Portuguese people also needed a new Constitution, with the previous one being over 40 years old. Although Portugal almost always had more stability in terms of government and politics, the 1976 document carried several innovations such as a clear definition of parliament, prime minister, political parties and elections, and an independent judiciary system.
Text-wise, the Portuguese constitution is also pretty wordy although not quite as the current Brazilian document:
Total words: 15111
Total unique words: 3107
% of unique words: 20.56
The top 10 most common words in the current Portuguese Constitution are:
- “Lei” is understandably the most common word as is the Portuguese version of “Law”,
- The other words denote a larger concern with the common interest, at least in comparison with the Brazilian Constitution. Words such as “assembleia”, “direito/direitos”, “cidadãos” are very prominent and among the top 10.
A look also at the word cloud for the common words in the Portuguese Constitution, not to scale.
TF-IDF and independent terms
TF-IDF is a measure intended to show the importance of a word within a collection of documents, in this case, the combination of both constitutions. It shows clearly words that are exclusive to a certain document or way more frequent in a document than the others of the group. For more on TF-IDF, check here.
The figure below shows the words that are more frequent within a document and don’t appear much or at all in the other. Terms on the left, such as “federal”, “congresso” and “complementar” are used only in the Brazilian Constitution, while “assembleia”, “autónomas” and “económico” appear only in the Portuguese document.
Obs: there are some words like “económico” and “sector” that are written differently in Brazil than in Portugal, so a subsequent study would be necessary to “normalize” both types of Portuguese. And that will probably be a study on its own soon, stay tuned.
- Text Mining is a great technique for analyzing texts and allows us to compare documents so important as is the Constitution of a country.
- The Brazilian Constitution is wordier and slightly longer than the Portuguese counterpart.
- At least when most common words are concerned, the Portuguese Constitution is more focused on the common good and has a lot of instances of the Portuguese-equivalent words of “social”, “rights”, and “citizens”.
- Words like “federal”, “congresso” and “complementar” are used almost exclusively in the Brazilian Constitution and “assembleia”, “autónomas” and “económico” only in the Portuguese one.
For the full code and texts used in this post, head to my GitHub page and check some of my recent works here.