A lot of us are familiar with tests and exams we have been taken through our life from school exams to college admissions tests to a driving or a corporate ethics test. I was recently taking one of these tests and wondered if a bot or an AI system could take the test and if such an intelligent system exists today. I am particularly referring to tests which require reading, learning, and reasoning. In this article, I will share the latest advancements in AI and analyze the latest QA (Question Answering) systems through an example course and exam.
Latest from AI Advancements and QA systems
- AI systems can now read and summarize, do creative writing, do facial recognition (shoppers are being tracked), hear and detect sounds, answer simple factual questions (think Google Search), speak and schedule appointments (remember Google Duplex), smell and detect illnesses, touch and pick berries (robots in the fields), walk and move like humans (think Boston Dynamics), understand emotions and treat mental health (X2 is one such company) and do many more activities (read more here)
- AI2’s (Allen Institute) Aristo AI system excelled at a standardized 8th grade science test in 2019. However, it looks like Aristo cannot really “understand” concepts and it’s simply using word associations and similar tactics to pass a test. The 8th grade test was easy — it had no diagrams (and text only) and the questions had multiple-choice answers to select from. The institute, though, is doing cutting edge research in building machines that can read, learn and reason.
- But can AI “learn to learn”? Can AI systems only do specialized tasks only or can they become a general purpose learner? A Forbes article talks about the human abilities of meta-reasoning and meta-learning and Google’s MultiModel which was designed to do a variety of tasks simultaneously from image recognition to language translation.
- The world of language understanding and text generation is getting better with the launch of GPT-3 developed by OpenAI and a learning tool (LearnFromAnyone) built using GPT-3 (where you can almost talk to and learn from Elon Musk or Chandwick Boseman).
Can AI take a reading, comprehension and question-answer test?
Now, I am going to analyze if any state-of-the-art AI system could take a reading and question answering test and pass. For the sake of simplicity, let’s assume the course is a corporate requirement on “How to act in a workplace or office environment to protect yourself and others from COVID”. Most corporations might soon require employees to complete such a course. The course content includes:
- guidelines to maintain social distancing such as staying at least 6 ft. from others at all times
- protocols on using masks, protective gear and sanitization materials
- rules for engaging in conversations or discussions, eating or drinking, or using public spaces like the toilet, common rooms, cafeteria, etc.
A typical corporate learning test is often a mix of a few types of questions as shown below and requires the candidate to pass every question in order to pass the exam. Questions are almost always multiple choice varying between a binary Yes/No and many answer choices. Employees can take the test more than once during the same or multiple sittings, but are required to pass it at least once. Broadly, I would categorize the questions in a test like this as one of the following.
Examining common question types and state-of-the-art solutions
- Simple questions with a binary Yes/No answer — these types of questions are based on more explicit knowledge from the course content and require basic comprehension abilities. A system with basic language and logic skills should be able to understand course content and pass this test easily. Examples of these types of questions might include:
“When entering the office, you should maintain at least 6 ft of distance from others. You should let those ahead of you enter first.” If you agree, choose Yes and if not, choose No
“It is okay to take your mask off when you are talking to a colleague at work.” If you agree, choose Yes and if not, choose No
Solution: A system to answer these questions correctly might simply need to do smart text or semantic matching between the question and the course content. Using the latest language models and technologies that Google and Bing or AI2 have developed, this should not be a hard problem to solve. Google has already picks up passages from the pages it indexes and shows relevant answers for user questions (see screenshot below and read more here). There are many ways to develop an AI system to answer these questions – using latest text matching techniques, training a model on already taken tests, developing systems that can learn and reason.
- Nuanced questions — these questions would require a student or a system to understand not just explicit but implicit lessons. For example, the questions below refer to 3 ft. and 10 ft. which might not be distances used in the course content but can be derived from the content. For example, if the course content says “masks must be never be removed at the workplace”, the system would need to understand this includes every spot or corner marked as workplace (including indoor and outdoor cafes, prayer rooms, rooftops, balconies, kitchen, bathrooms, etc. inside an office building or complex).
“It is okay to stay 3 ft. apart from a colleague from your team when entering the office.” If you agree, choose Yes and if not, choose No
“Having lunch while staying 10 ft. away from others and keeping your masks off is okay.” If you agree, choose Yes and if not, choose No
“It is okay to take the mask off when sitting by yourself in a prayer room”. If you agree, choose Yes and if not, choose No
Solution: These questions are not that straight forward and might require more thorough language understanding models that can re-write sentences, generate synonyms, and essentially, create copies of the course content that can enhance the understanding of the content. For example, a guideline like “It is always recommended to stay more than 6 ft. apart from anyone inside the office premises” might be rewritten as: ‘ It is always recommended to (and never allowed to not) stay (stand, talk with, chat with, have lunch with, etc.) more than 6 ft. (like 7 ft. or 10 ft….) apart (away or further or distanced) from anyone (any person, friends, colleagues, managers, supervisors, etc.) inside (at, around, within…) the office premises (building, complex, location).’
This is one of several ways to enrich a system’s understanding of course content so it is better equipped to answer questions. Google’s latest advances in transformer models or BERT , AI advances and question answering might be great solutions too. AI2’s Aristo has developed better learning and reasoning techniques which might come handy.
- Situational questions — these questions are harder and would need a strong understanding of the question, finding relevant course content, sometimes looking for relevant information not available in the course content, and reasoning to arrive at the answer. For example, the question below requires understanding that keeping masks on your mouth and nose is always mandatory except while putting food in your mouth AND it is okay to ask for reasonable accommodations when interacting with colleagues or friends or other people.
“Tiara is a colleague from your team that you really enjoy working with. Tiara invites you to a lunch with her in the cafeteria and insists you do not put on your mask at all as she cannot hear people clearly with their masks on.” What would you tell Tiara?”
Choose from the following: (a). Tell Tiara you can have lunch with her but that you would want both her and you to keep the masks on when not putting food in your mouth; (b). Tell Tiara you cannot have lunch with her and stop any communication with her thereafter; ©. Report Tiara to the Building Management committee
Solution: These questions might be the hardest as they require a lot more comprehension ability, logic and an understanding of not just the course content but everyday knowledge on certain topics. For example, the above question requires understanding of social protocols when talking to colleagues or friends. AI2’s Aristo has a UnifiedQA system that seems capable of answering questions with high accuracy on a few datasets it has been trained on, can be fine-tuned to the specialized task, and seems to be a good starting point for QA systems. There is a lot of advanced research happening in combining knowledge from multiple sentences or sources to answer questions (check out paper 0, paper 1, paper 2, and paper 3)
- Write the answer questions: Another type of question not found typically in a corporate test setting but commonly found in school or college exams, is to write the answer to a question. These questions are harder and might require thorough comprehension or complex language understanding and generation models (like GPT-3). There is not one right answer but many to these questions as every individual might express what they have learnt in their own style. The answer could be a few words to a few sentences or an essay in some cases. For example, for the questions below, one could simply copy or memorize the most relevant words or sentences from the course material. For the first question, the answer is more obvious, while for the second, it could be derived. The third question would require a system to either copy the most relevant sentences and understand and reason and write responses.
“What is the appropriate minimum distance to maintain from colleagues at work?”
“Who is in-charge of ensuring we all follow apt social distancing protocols at work?”
“Describe a few important guidelines on keeping safe social distance from colleagues when at a workplace or office” Write a clear response in less than 100 words.
A more complex form of these questions might describe a situation and ask you to write an answer. For example, for the question above on Tiara, one might be asked to write a response. Again, the responses to this question can be many. A candidate or a system can approach it in many ways — this might require enlisting ideas, structuring them and organizing in a clear comprehensible manner for the reader. Even listing ideas will require knowledge from the course content but in addition, knowledge from outside the course content. GPT-3 has shown promise in this area and it seems capable to generating text and writing articles and stories as good as humans.
As we can see above, there is a lot of advancements happening across language understanding and text matching, learning and reasoning, and language generation, with the advent of complex models that can train on troves of data and have billions of parameters while running on massive computation power. The use cases and benefits of these new systems are also many from answering simple and complex questions people have every day to writing summaries to curating creative content and many more. From shopping to education to healthcare to entertainment, AI and ML systems will continue to make our lives easier. While this happens, we have to make sure we are aware of any privacy, security, fake information or ethics issues that arise as machines learn more and more tasks are automated.