• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Home
  • Crypto Currency
  • Technology
NEO Share

NEO Share

Sharing The Latest Tech News

  • Home
  • Artificial Intelligence
  • Machine Learning
  • Computers
  • Mobile
  • Crypto Currency

Natural Language Processing — Intro!

December 17, 2020 by systems

Amit Singh Rathore

NLP is a sub-field of AI which enables computers understand & process human generated text data. In this blog we will learn the basic tasks of NLP and also some applications of NLP.

Once we have text, first task that is performed is to pre-process the data.

Sentence Segmentation

Break the text into individual sentences.

Tokenization

Creating words/vocabulary/token from sentence.

Stop-words removal

Remove most common and not so important words e.g. the, a , an, of, in

Stemming/Lemmatization

Stemming — removing affixes and keeping stem

Lemmatization — Finding the root form of word.

Standardization of text

Domain specific cleaning. Depending on the domain of text corpus.

Noise removal

Punctuation removal, Numbers removal

NLP outcome are not that visually intuitive, to make that interesting we rely on certain visualization methods to present the results. Following are few of those techniques.

Word Cloud

Key Phrases

Text Network

Parts-of-Speech Tagging

Before machine can process text, we need to convert the texts into numeric vectors. Following are some models that do so.

Bag of words

In this method we represent a sentence in a vector which says whether that word is present in a corpus or bag.

TF-IdF

It improves upon bag of word approach. It penalize the words which are more common and does not have much info with them. Like if a word appears in all docs then that is not so important. That is done by taking inverse document frequency which is defined as:

log(number of documents/number of documents containing the word)

Word Embedding

Representation for word which conveys meaning, semantic relationship and context.

Language Models

Word2Vec, BERT, ELMO

Let us understand the NLP’s few prominent applications. The text is from this page.

Key Phrase Extraction

Understand the relative prominence of the Key Phrases within the text. Gives a high level idea of what the text is about.

Sentiment analysis

Understanding the sentiment (e.g. positive, negative, neutral, angry, enthusiastic) about a given subject from text.

Topic Modelling

Discovering hidden semantic structures or abstract concepts in documents

Contextual Search

Retrieve the documents which are contextually and semantically similar to the user’s query.

Text Summarization

Abstractive Summarization — Generate new sentence that convey the meaning of the original text in a smaller number of sentences.

Extractive Summarization — Important sentence from the original text are identified and extracted.

Entity Recognition

Locate and classify named entities in a text into pre-defined categories.

Happy understanding!!

Filed Under: Machine Learning

Primary Sidebar

Data Science in Infographics

South Korean internet firm Naver to acquire Toronto-based Wattpad for $600 million USD

Artificial Intelligence in Medicine

O que são Capsule Networks e porquê utilizar.

Porsche reveals cheaper Taycan with a roughly $100,000 CAD price tag

Footer

  • Privacy Policy
  • Terms and Conditions

Copyright © 2021 NEO Share

Terms and Conditions - Privacy Policy