• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Home
  • Crypto Currency
  • Technology
  • Contact
NEO Share

NEO Share

Sharing The Latest Tech News

  • Home
  • Artificial Intelligence
  • Machine Learning
  • Computers
  • Mobile
  • Crypto Currency

The NLP Cypher | 01.03.21

January 4, 2021 by systems

Hey Welcome back, you made it! Now, let us begin 2021 on the right path with an impromptu moment of customer service by Elon Musk:

declassified

FYI

If you haven’t read our Mini Year Review, we released it last week while everyone was on holiday 😬. Per usual, if you enjoy the read please give our article a 👏👏 and share it with your friends and enemies!

Now, let’s play a game. Let’s say we have all 7,129 NLP paper abstracts for the entire year of 2020. And now we run BERTopic 👇 on top of those abstracts for some topic modeling to find the most frequent topics discussed.

What do we get?

  1. speech-related
  2. bert-related
  3. dialogue-related
  4. embeddings-related
  5. graphs-related

For a more detailed readout of the topics 👇

The Pile dataset, an 800GB monster of English text for language modeling. 👀

The Pile is composed of 22 large and diverse datasets:

paper

The diversity of the dataset is what makes it unique and powerful for holding cross-domain knowledge.

As a result, to score well on the Pile BTB (Bits per Byte) benchmark a model should

…“be able to understand many disparate domains including books, github repositories, webpages, chat logs, and medical, physics, math, computer science, and philosophy papers.”

The dataset is formatted in jsonlines in zstandard compression. You can also view more datasets on The Eye 👁 here:

The Pile

The 👁

Corporations are adapting to NLP models that listen in on filings and other financial-related disclosures. According to a new study, corporations are choosing their words wisely in order to fool machines so they are able to reduce the negative sentiment in their statements.

Paper:

This week, a couple of ML book prints dropped from well known authors in machine learning. The first is from Jurafsky and Martin’s Speech and Language Processing’s book with new chapters/updates:

Highlights:

-new version of Chapter 8 (bringing together POS and NER in one chapter),

-new version of Chapter 9 (with Transformers)

-Chapter 11 (MT)

neural span parsing and CCG parsing moved into Chapter 13 (Constituency Parsing) and Statistical Constituency Parsing moved to Appendix C

new version of Chapter 23 (QA modernized)

Chapter 26 (ASR + TTS)

Also Murphy’s Probabilistic Machine Learning draft made the rounds this week. And there’s code along with it! Enjoy.

https://probml.github.io/pml-book/book1.html

code:

There’s a new way to explore the Internet Archive for awesome content.

Someone built 👇 as a way to block ads 🤣.

“Made an AI to track and analyze every websites, a bit like a web crawler, to find and identify ads. It is a list containing over 1,300,000 domains used by ads, trackers, miners, malwares.”

A collection of recent released repos that caught our 👁

Filed Under: Machine Learning

Primary Sidebar

Stay Ahead: The Latest Tech News and Innovations

Cryptocurrency Market Updates: What’s Happening Now

Emerging Trends in Artificial Intelligence: What to Watch For

Top Cloud Computing Services to Secure Your Data

The Future of Mobile Technology: Recent Advancements and Predictions

Footer

  • Privacy Policy
  • Terms and Conditions

Copyright © 2025 NEO Share

Terms and Conditions - Privacy Policy