• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Home
  • Crypto Currency
  • Technology
  • Contact
NEO Share

NEO Share

Sharing The Latest Tech News

  • Home
  • Artificial Intelligence
  • Machine Learning
  • Computers
  • Mobile
  • Crypto Currency

What is BERT?

January 12, 2021 by systems

Same word can have different meanings in different contexts. For example:

I arrived at the bank after crossing the _____

we cannot know whether this “bank” refers to a river or a financial institute without having knowledge of the end word.

BERT learns word context based on all its surrounding words, just like humans! This is achieved using self attention. Let us understand it

Source

What does “it” in this sentence refer to? Is it referring to the street or to the animal? It’s a simple question to a human, but not as simple to an algorithm.

Self-attention solves this problem. It allows the model to look at the other words in the sentence to get a better understanding of “it”.

Associating each word in a sentence with every other word is called self-attention. Let’s see how we can calculate self-attention by an example.

Consider this phrase — “Action gets results”.

Source

  1. We created three vectors: Query Vector, Key Vector, Value Vector for the given phrase. These vectors are trained and updated during the training process. We’ll know more about their roles once we are done with this section
  2. The score for the first word “Action” is calculated by taking the dot product of the Query vector (q1) with the keys vectors (k1, k2, k3) of all the words.
  3. Then, these scores are divided by 8 which is the square root of the dimension of the key vector
  4. Next, these scores are normalized using the softmax activation function.
  5. These normalized scores are then multiplied by the value vectors (v1, v2, v3) and sum up the resultant vectors to arrive at the final vector (z1). It is then passed on to the feed-forward network as input:

So, z1 is the self-attention vector for the first word of the input sequence “Action gets results”.

That’s it — attention is just a fancy name for weighted average!

Filed Under: Artificial Intelligence

Primary Sidebar

Stay Ahead: The Latest Tech News and Innovations

Cryptocurrency Market Updates: What’s Happening Now

Emerging Trends in Artificial Intelligence: What to Watch For

Top Cloud Computing Services to Secure Your Data

The Future of Mobile Technology: Recent Advancements and Predictions

Footer

  • Privacy Policy
  • Terms and Conditions

Copyright © 2025 NEO Share

Terms and Conditions - Privacy Policy