Python Packages for NLP-Part 1

Polyglot- Python package for NLP operations

Natural Language Processing aims at manipulating the human/natural language to make it understandable for the machine. It deals with text analysis, text mining, sentiment analysis, polarity analysis, etc. There are different python packages that make NLP operations easy and effortless.

All NLP packages have different functionalities and operations which makes it easier for end-user to perform text analysis and all sorts of NLP operations. In this series of articles, we will explore different NLP packages for python and all of their functionalities.

In this article, we will be discussing Polyglot which is an open-source python package used for manipulating text and extracting useful information from it. It has got several functionalities that make it better and easy to use than other NLP-based libraries. Here we will discuss its different functionalities and how to implement them.

Let’s get started.

In order to get started, we first need to install polyglot and all of its dependencies. For this article we will be using Google Colab, the code given below will install polyglot and its dependencies.

!pip3 install polyglot
!pip3 install pyicu
!pip3 install pycld2
!pip3 install morfessor

After installing these libraries we also need to install some functionalities of polyglot which will be used in this article.

!polyglot download embeddings2.en
!polyglot download pos2.en
!polyglot download ner2.en
!polyglot download morph2.en
!polyglot download sentiment2.en
!polyglot download transliteration2.hi

The next step is to import the required libraries and functionalities of polyglot that we will explore in this article.

import polyglot
from polyglot.detect import Detector
from polyglot.text import Text, Word
from polyglot.mapping import Embedding
from polyglot.transliteration import Transliterator

Let us start by exploring some of the NLP functionalities that are provided by polyglot, but before that let us input some sample data that we will be working on.

sample_text = '''Piyush is an Aspiring Data Scientist and is working hard to get there. He stood Kaggle grandmaster 4 year consistently. His goal is to work for Google.'''

Language Detection

Polyglot’s language detector can easily identify the language in which the text is written.

#Language detection
detector = Detector(sample_text)
print(detector.language)

Language(Source: By Author)

2. Sentences and Words

In order to extract the sentences or words from the text/corpus, we can use polyglot functions.

#Tokenize
text = Text(sample_text)
text.words

Words(Source: By Author)

text.sentences

Sentences(Source: By Author)

3. POS Tagging

Part of speech tagging is an important NLP operation that helps us in understanding the text and their tagging.

#POS tagging
text.pos_tags

POS-Tagging(Source: By Author)

4. Named Entity Recognition

NER is used to identify the person, organization, and location if any in the corpus/text dataset.

#Named entity extraction
text.entities

NER(Source By Author)

5. Morphological Analysis

#Morphological Analysis
words = ["programming", "parallel", "inevitable", "handsome"]for w in words:
w = Word(w, language="en")
print(w, w.morphemes)

Morphological(Source: By Author)

6. Sentiment Analysis

We can analyze the sentiment of a sentence.

#Sentiment analysistext = Text("Himanshu is a good programmer.")
for w in text.words:
print(w, w.polarity)

Polarity(Source: By Author)

7. Translate

We can translate text into different languages.

#Transliteration
transliterator = Transliterator(source_lang="en", target_lang="hi")
new_text = ""
for i in "Piyush Ingale".split():
new_text = new_text + " " + transliterator.transliterate(i)
new_text

Translate(Source: By Author)

This is how you can explore the different properties of Polyglot for text datasets easily without any hassle.

Go ahead try this with different textual datasets, in case you find any difficulty you can post that in the response section.

This post is in collaboration with Piyush Ingale

Thanks for reading! If you want to get in touch with me, feel free to reach me on hmix13@gmail.com or my LinkedIn Profile. You can view my Github profile for different data science projects and packages tutorials. Also, feel free to explore my profile and read different articles I have written related to Data Science.

Footer