Bear as service is a sentence encoding service for mapping a variable-length sentence to a fixed-length vector for Python users.
BERT is an NLP model developed by Google for pre-training language representations. It leverages an enormous amount of plain text data publicly available on the web and is trained in an unsupervised manner. Pre-training a BERT model is a fairly expensive yet one-time procedure for each language. Fortunately, Google released several pre-trained models where you can download from here.
Sentence Encoding/Embedding is an upstream task required in many NLP applications, e.g. sentiment analysis, text classification. The goal is to represent a variable-length sentence into a fixed-length vector, e.g. hello world
to [0.1, 0.3, 0.9]
. Each element of the vector should “encode” some semantics of the original sentence.
Finally, bert-as-service
uses BERT as a sentence encoder and hosts it as a service via ZeroMQ, allowing you to map sentences into fixed-length representations in just two lines of code.
What makes it special?
- state of the art
- easy to use
- fast
- scalable
- reliable
Textblob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks.
Simple, Pythonic, text processing library “Textblob” is known for:
- Sentiment analysis,
- part-of-speech tagging,
- noun phrase extraction,
- translation,
- and more.
TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both.
Features it offers:
- Noun phrase extraction
- Part-of-speech tagging
- Sentiment analysis
- Classification (Naive Bayes, Decision Tree)
- Tokenization (splitting text into words and sentences)
- Word and phrase frequencies
- Parsing
- n-grams
- Word inflection (pluralization and singularization) and lemmatization
- Spelling correction
- Add new models or languages through extensions
- WordNet integration
Ciphey is a library that automatically decrypts encryptions without knowing the key or cipher, decodes encodings, and crack hashes.
It is a fully automated decryption/decoding/cracking tool in which u input encrypted text, and get the decrypted text back using natural language processing & artificial intelligence, along with some common sense.
The question may arise What type of encryption?
That’s the point. You don’t know, you just know it’s possibly encrypted. Ciphey will figure it out for you. Ciphey can solve most things in 3 seconds or less.
Ciphey aims to be a tool to automate a lot of decryptions & decodings such as multiple base encodings, classical ciphers, hashes or more advanced cryptography.
If you don’t know much about cryptography, or you want to quickly check the ciphertext before working on it yourself, Ciphey is for you.
Why Ciphey?
- 50+ encryptions/encodings
- Custom Built Artificial Intelligence with Augmented Search (AuSearch) for answering the question “what encryption was used?”
- The custom-built natural language processing module
- Multi-Language Support
- Supports encryptions and hashes
Doccano is an open-source text annotation tool for machine learning practitioners.
It provides annotation features for text classification, sequence labelling and sequence to sequence tasks. So, you can create labelled data for sentiment analysis, named entity recognition, text summarization and so on. Just create a project, upload data and start annotating. You can build a dataset in hours.
Features
- Collaborative annotation
- Multi-language support
- Mobile support
- Emoji support
- Dark theme
- RESTful API
LazyNLP is an open-source library to scrape and clean web pages to create massive datasets.
A straightforward library that allows you to crawl, clean up, and deduplicate webpages to create massive monolingual datasets. Using this library, you should be able to create datasets larger than the one used by OpenAI for GPT-2.
This library uses Python 3 and uses URLs of the webpages to download the dataset by scraping.
Textract is an open-source library to extract text from any document without any muss or fuss. This package provides a single interface for extracting content from any type of file, without any irrelevant markup.