Test Proctoring services have an unhealthy reliance on AI
Contrary to popular belief, Artificial Intelligence (AI) can’t fix everything. Sure AI has afforded breakthroughs in self-driving cars and healthcare, but it brings up the question of if there should be areas that AI should stay out of. If ever there was an area where AI should stay out of, it’s online test proctoring. These test-proctoring services include: Respondus, ProctorU, Proctorio, Examity, and Honorlock. These platforms have gained in popularity since the start of remote learning, and have met strong resistance from students. Setting aside the ethical implications from their data collection, let’s explore some of the inherit problems that stem from using AI in these high leverage situations.
Much like Mr. Miyagi famous quote from the 1984 Karate Kid, “Eyes! Always look eyes!”, most of these systems are using AI to track eye and facial movements during the exam. This is considered fair practice because well you’re in front of a computer that could look up the answer. The use of AI unfortunately doesn’t just stop at video processing though. According to this complaint filed by the Electronic Privacy Information Center (EPIC),
“First, the AI applies “advanced algorithms” to detect faces, motion, and lighting changes. Next, the AI analyzes keyboard activity, mouse movements, hardware changes, and other computer device data to identify patterns and outliers associated with cheating. Finally, the AI analyzes each student’s exam interactions, including a question-by-question analysis to compare against other students who took the same exam[86]”. [1]
But think about that for a second. Doesn’t that seem like they are using AI as a bandaid for all of these potential vulnerabilities in their software? To me it seems like they have access to all of this sensitive data, so why not run an AI model on it and see if anything sticks out. But the reality is that there are lots of reason why the data might have deviations.
For example, I love the fact that when taking an online test, I can highlight the question and have my Mac read the question to me. The AI model is guarenteed to flag my exam if I highlight every question on an exam. Does that mean I was cheating and deserve a 0 along with damaging the way my professor’s views my work?
So let’s say you’re tasked with building a surveillance system that needs to have advanced facial recognition and eye tracking. An inherit problem with this is that there are a plethora of outside “noise” varaible that are hard to account for. For example: bad lighting, reading glasses, poor internet connection, bad webcam quality, etc all pose major problems. One such problem is how are you going to train this AI model. You have to train the model so that the machine learning can learn what to look for. I know this because, well, I’ve spent time looking at code to do just this for slot machines. It’s a known fact that facial recognition systems aren’t perfect.
“…facial recognition systems have repeatedly been found less effective at identifying or evaluating faces of color. A 2017 study of facial recognition systems found that darker-skinned females were 32 times more likely to be misclassified than lighter- skinned males.” [1]
So right off the bat, your system is going to have a harder time correctly flagging exams of people of color. That’s a huge problem, and requires a lot of testing to fix. Another undesirable feature is that AI system struggle when someone has disabilities. Say for exam someone has a lazy eye. That makes it hard to tell what they’re looking at. Or let’s say that a student has a medical condition that results in a facial tic like involuntary mouth movements. Maybe the people making proctoring software sees these demographics as not important, but the reality is these groups are at a disadvantage because of bad software. In my opinion, this is lazy testing and clearly not thinking through the implications of what you’re making.
Part of the problem of building “cheat proof” software is that you have to be hyper aware of any potential deviations. This clearly has lead to over collecting data and over analyzing the situation. And because they want their software to remain “cheat proof” they are extremely protective over their code and are secretive about how they are using the data they collect behind the scenes. But as a consumer of this opaque software leads to extra anxiety at an extremely vulnerable time.
AI is a bandaid to bigger problem with test taking. As Carol Dweck point out in her TED talk below, the power of “Not Yet” is more lasting than a failing grade. The problem lies with students doing whatever it takes to get a passing grade when they should be worried about sucking the marrow out of the class and growing with the content. I think we, as a society, should write more AI powered software to help students learn instead of helping them fail.
Leave a comment of other cases where you think AI should not be used to solve systematic problems.