Texas is replacing thousands of human exam graders with AI

Students in Texas taking their state-mandated exams this week are being used as guinea pigs for a new artificial intelligence-powered scoring system set to replace a majority of human graders in the region.

The Texas Tribune reports an “automated scoring engine” that utilizes natural language processing — the technology that enables chatbots like OpenAI’s ChatGPT to understand and communicate with users — is being rolled out by the Texas Education Agency (TEA) to grade open-ended questions on the State of Texas Assessments of Academic Readiness (STAAR) exams. The agency is expecting the system to save $15–20 million per year by reducing the need for temporary human scorers, with plans to hire under 2,000 graders this year compared to the 6,000 required in 2023.

“We wanted to keep as many constructed open-ended responses as we can, but they take an incredible amount of time to score.”

The STAAR exams, which test students between the third and eighth grades on their understanding of the core curriculum, were redesigned last year to include fewer multiple-choice questions. It now contains up to seven times more open-ended questions, with TEA director of student assessment Jose Rios saying the agency “wanted to keep as many constructed open-ended responses as we can, but they take an incredible amount of time to score.”

According to a slideshow hosted on TEA’s website, the new scoring system was trained using 3,000 exam responses that had already received two rounds of human grading. Some safety nets have also been implemented — a quarter of all the computer-graded results will be rescored by humans, for example, as will answers that confuse the AI system (including the use of slang or non-English responses).

While TEA is optimistic that AI will enable it to save buckets of cash, some educators aren’t so keen to see it implemented. Lewisville Independent School District superintendent Lori Rapp said her district saw a “drastic increase” in constructed responses receiving a zero score when the automated grading system was used on a limited basis in December 2023. “At this time, we are unable to determine if there is something wrong with the test question or if it is the new automated scoring system,” Rapp said.

AI essay-scoring engines are nothing new. A 2019 report from Motherboard found that they were being used in at least 21 states to varying degrees of success, though TEA seems determined to avoid the same reputation. Small print on TEA’s slideshow also stresses that its new scoring engine is a closed system that’s inherently different from AI, in that “AI is a computer using progressive learning algorithms to adapt, allowing the data to do the programming and essentially teaching itself.”

The attempt to draw a line between them isn’t surprising — there’s no shortage of teachers despairing online about how generative AI services are being used to cheat on assignments and homework. The students being graded by this new scoring system may have a hard time accepting how they believe “rules for thee and not for me” are being applied here.