• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Home
  • Crypto Currency
  • Technology
  • Contact
NEO Share

NEO Share

Sharing The Latest Tech News

  • Home
  • Artificial Intelligence
  • Machine Learning
  • Computers
  • Mobile
  • Crypto Currency

Top NLP(Natural Language Processing) Projects Using Python (Includes links to Repository on Github)

January 12, 2021 by systems

Github

Official Documentation

Bear as service is a sentence encoding service for mapping a variable-length sentence to a fixed-length vector for Python users.

BERT is an NLP model developed by Google for pre-training language representations. It leverages an enormous amount of plain text data publicly available on the web and is trained in an unsupervised manner. Pre-training a BERT model is a fairly expensive yet one-time procedure for each language. Fortunately, Google released several pre-trained models where you can download from here.

Sentence Encoding/Embedding is an upstream task required in many NLP applications, e.g. sentiment analysis, text classification. The goal is to represent a variable-length sentence into a fixed-length vector, e.g. hello world to [0.1, 0.3, 0.9]. Each element of the vector should “encode” some semantics of the original sentence.

Finally, bert-as-service uses BERT as a sentence encoder and hosts it as a service via ZeroMQ, allowing you to map sentences into fixed-length representations in just two lines of code.

What makes it special?

  • state of the art
  • easy to use
  • fast
  • scalable
  • reliable

Github

Official Documentation

Textblob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks.

Simple, Pythonic, text processing library “Textblob” is known for:

  • Sentiment analysis,
  • part-of-speech tagging,
  • noun phrase extraction,
  • translation,
  • and more.

TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both.

Features it offers:

  • Noun phrase extraction
  • Part-of-speech tagging
  • Sentiment analysis
  • Classification (Naive Bayes, Decision Tree)
  • Tokenization (splitting text into words and sentences)
  • Word and phrase frequencies
  • Parsing
  • n-grams
  • Word inflection (pluralization and singularization) and lemmatization
  • Spelling correction
  • Add new models or languages through extensions
  • WordNet integration

Github

Ciphey is a library that automatically decrypts encryptions without knowing the key or cipher, decodes encodings, and crack hashes.

It is a fully automated decryption/decoding/cracking tool in which u input encrypted text, and get the decrypted text back using natural language processing & artificial intelligence, along with some common sense.

The question may arise What type of encryption?

That’s the point. You don’t know, you just know it’s possibly encrypted. Ciphey will figure it out for you. Ciphey can solve most things in 3 seconds or less.

Ciphey aims to be a tool to automate a lot of decryptions & decodings such as multiple base encodings, classical ciphers, hashes or more advanced cryptography.

If you don’t know much about cryptography, or you want to quickly check the ciphertext before working on it yourself, Ciphey is for you.

Why Ciphey?

  • 50+ encryptions/encodings
  • Custom Built Artificial Intelligence with Augmented Search (AuSearch) for answering the question “what encryption was used?”
  • The custom-built natural language processing module
  • Multi-Language Support
  • Supports encryptions and hashes

Github

Official Documentation

Doccano is an open-source text annotation tool for machine learning practitioners.

It provides annotation features for text classification, sequence labelling and sequence to sequence tasks. So, you can create labelled data for sentiment analysis, named entity recognition, text summarization and so on. Just create a project, upload data and start annotating. You can build a dataset in hours.

Features

  • Collaborative annotation
  • Multi-language support
  • Mobile support
  • Emoji support
  • Dark theme
  • RESTful API

Github

LazyNLP is an open-source library to scrape and clean web pages to create massive datasets.

A straightforward library that allows you to crawl, clean up, and deduplicate webpages to create massive monolingual datasets. Using this library, you should be able to create datasets larger than the one used by OpenAI for GPT-2.

This library uses Python 3 and uses URLs of the webpages to download the dataset by scraping.

Github

Official Documentation

Textract is an open-source library to extract text from any document without any muss or fuss. This package provides a single interface for extracting content from any type of file, without any irrelevant markup.

Filed Under: Artificial Intelligence

Primary Sidebar

Stay Ahead: The Latest Tech News and Innovations

Cryptocurrency Market Updates: What’s Happening Now

Emerging Trends in Artificial Intelligence: What to Watch For

Top Cloud Computing Services to Secure Your Data

The Future of Mobile Technology: Recent Advancements and Predictions

Footer

  • Privacy Policy
  • Terms and Conditions

Copyright © 2025 NEO Share

Terms and Conditions - Privacy Policy