How AI Will End the One-Size-Fits-All Approach in Human Assessment

Assume that you walk into a store to buy a nice suit (or a dress) for yourself. You walk around the store for a while and finally find a good one that you really like. When you ask the sales associate to help you find the right size, she/he says “We only sell one-size-fits-all clothes. You can try on the suit in the fitting room and see if it actually fits you.” This story may sound like dystopian fiction to you because today most clothing stores around the world offer different sizes of clothing and additional tailoring/alteration services. Therefore, you would probably never buy a one-size-fits-all suit.

Surprisingly, when it comes to human assessment (e.g., tests, quizzes, and recruitment exams), we happen to be a big fan of the one-size-fits-all approach. When we need to measure a particular skill or ability, we often prefer a simple assessment that has the same questions for all individuals, regardless of their ability levels. By using such assessments, we unwittingly hope that the assessment will “fit” all individuals sufficiently— which is rarely the case. This is why we often see individuals who would argue that the questions were either too easy or too difficult for them, despite taking the same assessment. For these individuals, one-size-fits-all assessments fail to indicate how good (or bad) their performance is.

So, is it possible to tailor an assessment to each individual? The answer is yes! In fact, the idea of tailoring an assessment to each individual goes all the way back to the early 1900s. Alfred Binet, a French psychologist, created the very first intelligence quotient (IQ) test — known as the Binet-Simon IQ test [1]. Binet’s IQ test consisted of a variety of questions normed by the chronological age of children (from 3 years through 11 years). The test started with a set of questions at the child’s age level. Depending on the child’s performance (e.g., mostly correct answers or incorrect answers), he administered the questions at the next higher (or lower) age level [2].

Binet’s tailored approach to human assessment was absolutely innovative and groundbreaking, but a bit laborious. Thanks to modern technology, we are now able to automate Binet’s tailored assessment approach via computers.

Computerized Adaptive Testing

Computerized adaptive testing (CAT) is a form of computer-based assessment that follows the idea of tailored human assessment (or tailored testing). CAT aims to create a tailored assessment for each individual by selecting the most suitable questions based on their responses to the previously administered questions. For example, if an individual answers a question of moderate difficulty correctly, then CAT assumes that this individual’s ability level is above the difficulty level of the question and thus it presents a more difficult question in the next round. If, however, the individual is not able to answer the question correctly, then she/he is administered an easier question in the next round (see Figure 1 below for an illustration of this process).

Figure 1. Flowchart of a typical CAT administration (Image by Author)

The question selection process described above continues until a stopping criterion is met (e.g., answering the maximum number of questions, reaching the allotted time, having a certain level of precision in estimating the individual’s final score). Overall, CAT allows the assessment to tailor (or adapt) itself to each individual’s level of ability, instead of following a one-size-fits-all approach. Some benefits of using CAT include:

the use of much fewer questions on the assessment,
saving time by avoiding questions that are either too hard or too easy, and
high and uniform precision in the estimation of individuals’ ability levels.

Today, many high-stakes assessments, such as the Armed Services Vocational Aptitude Battery (ASVAB), the National Council Licensure Examination (NCLEX), and the Graduate Management Admission Test (GMAT), use the CAT approach for evaluating candidates’ level of ability for various purposes, such as military entrance processing, licensing of nurses, and business school applications.

What Else?

Photo by Markus Winkler on Unsplash

Computerized adaptive testing, or CAT, is obviously a very good start to tailoring human assessments. However, it is not necessarily an artificial intelligence (AI) approach. With the help of AI, it is possible to further customize human assessments in other ways.

One of the best AI applications in human assessment is intelligent tutoring systems. These systems create a personalized learning environment for individuals based on their ability levels and areas of interest. Through a digital tutor trained with deep learning algorithms, individuals engage with customized learning materials recommended by the digital tutor, complete assessment tasks, and then receive feedback on their performance [3]. For example, Amira is a digital reading assistant designed for first and second graders. As students read-aloud stories at their own pace, Amira employs automatic speech recognition to listen to their reading in real-time, assesses their oral reading fluency, and intervenes when necessary.

Automated essay scoring is another advanced technology that employs natural language processing (NLP) techniques to automatically grade written text, ranging from responses to short-answer questions all the way to long essays. Automated essay scoring relies on building a scoring algorithm based on linguistic features extracted from a text that has already been scored by human raters. The scoring algorithm can be used to score the responses (or essays) of a new group of individuals [4]. With assessments using automated essay scoring, individuals get the opportunity to use their own words as they answer the questions — which is another form of tailoring.

In addition to tailoring the assessment itself, it is also possible to tailor how the assessment is implemented. For example, my colleagues and I have recently developed an intelligent recommender system that can guide teachers on when to evaluate their students’ learning during the school year. Our goal was to help teachers create a “personalized” assessment schedule for each student that considers the pace at which students acquire knowledge. The system helps the teacher find an optimal assessment schedule based on each student’s progress and thereby avoid overtesting students.

A group of researchers from Carnegie Mellon University have recently created a new method that allows teachers to build their own lessons and assessments in intelligent tutoring systems. With this method, teachers will be able to work with the AI-based digital tutor to further customize learning activities and assessments for students.

In addition to teachers and educators, the use of AI will also help companies and organizations create tailored talent assessments for recruitment and selection purposes. For example, some organizations have already begun to use virtual assistants (i.e., chatbots) that simulate human conversation with candidates and help human resources (HR) teams with recruitment decisions [5]. With AI-supported tailored assessments, the impact of human biases and stereotypes on selection decisions will also be alleviated.

I believe that the more we accept and embrace AI, the better (and more tailored) assessments we will be able to create in the future.

[1] Binet, A., & Simon, Th. A. (1905). Méthode nouvelle pour le diagnostic du niveau intellectuel des anormaux. L’Année Psychologique, 11, 191–244.

[2] http://iacat.org/node/442

[3] Paviotti, G., Rossi, P. G., & Zarka, D. (2012). Intelligent tutoring systems: an overview. Pensa Multimedia, 1–176.

[4] Gierl, M. J., Latifi S., Lai, H., Boulais, A. P., De Champlain, A. (2014). Automated essay scoring and the future of educational assessment in medical education. Medical Education, 48(10), 950–962.

[5] https://www.selectsoftwarereviews.com/buyer-guide/hr-chat-bots

Computerized Adaptive Testing

What Else?

Footer