The process of extracting targeted information from a piece of text is called NER. E.g., person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.
Let’s revisit our previous example where we asked our music assist bot to “play Coldplay”. An intuitive understanding from the given command is that the intent is to play somethings and entity is what to play. When we say “play Coldplay”, a chatbot would classify the intent as “play music”, and classify Coldplay as an entity, which is an Artist.
An entity is anything that exists in the real world, and can be a person, place, product, organization, or a concept.
Depending upon the application, there can be a large variety of entity types. For example, in news articles, entities could be people, places, companies, and organizations. In healthcare, entities could be diseases, drugs, and procedures. In the military, entities could be weapons, ships, and people.
There are many ways in which we can extract the important information from text.
It can range from a simple solution like rule based string matching to an extremely complex solution like understanding the implicit context behind the sentence and then extracting the entity based on the context. E.g., “Play Cricket”, and “Play Coldplay”.
A simple string / pattern matching example is identifying the number plates of the cars in a particular country. Since the pattern is fixed, we can write a regular expression to extract the pattern correctly from the sentence.
A complex sentence can be understood by identifying the implicit context behind the sentence. The entities can be extracted using the information stored in the context. The entity can be any from the list of : person, organization, place, event, product, service, time, quantity, etc .
Example: Apple announced the launch of their new device last Thursday that will compete with the likes of Samsung and Google. The device will be launched in India on the 11th of September .
Here, last Thursday refers to the last Thursday from the current date. To extract this information, we can use the information available in the context. That is, the current date, the day before yesterday, the day before that, etc.
Entity Extraction is a very important task in the following situations:
1.Information Retrieval: to fetch relevant information from a large collection of documents. Example: When we say ‘Play Coldplay’ we need to ensure that Coldplay exists in our database and then extract the information present.
2. Knowledge Discovery: to find information about entities and their relations between each other. Example: Identifying Google as ORG and Alphabet as a parent company. Based on this relation, we can infer Alphabet to be an ORG.
3. Information Filtering: to recommend relevant documents based on the identified entity. Example: We can recommend Maroon 5 based on the interest in Coldplay.
4. Question Answering: to answer questions about entities (who, what, when, where, why). Example: When was Google started, who are the founders, their business model, mode of operation and so on.
5. Security: to identify and track suspicious activities. Example: To identify entities not present in DB and to categorize them. Identifying non-members of an organization to prevent them from accessing confidential information.