Like millions of other people, the first thing Mark Humphries did with ChatGPT when it was released in late 2022 was ask it to perform parlor tricks, like writing poetry in the style of Bob Dylan — which, while very impressive, did not seem particularly useful to him, a historian studying the 18th-century fur trade. But Humphries, a 43-year-old professor at Wilfrid Laurier University in Waterloo, Canada, had long been interested in applying artificial intelligence to his work. He was already using a specialized text recognition tool designed to transcribe antiquated scripts and typefaces, though it made frequent errors that took time to correct. Curious, he pasted the tool’s garbled interpretation of a handwritten French letter into ChatGPT. AI corrected the text, fixing all the Fs that had been misread as an S and even adding missing accents. Then Humphries asked ChatGPT to translate it to English. It did that, too. Maybe, he thought, this thing would be useful after all.
For Humphries, AI tools held a tantalizing promise. Over the last decade, millions of documents in archives and libraries have been scanned and digitized — Humphries was involved in one such effort himself — but because their wide variety of formats, fonts, and vocabulary rendered them impenetrable to automated search, working with them required stupendous amounts of manual research. For a previous project, Humphries pieced together biographies for several hundred shellshocked World War I soldiers from assorted medical records, war diaries, newspapers, personnel files, and other ephemera. It had taken years and a team of research assistants to read, tag, and cross-reference the material for each individual. If new language models were as powerful as they seemed, he thought, it might be possible to simply upload all this material and ask the model to extract all the documents related to every soldier diagnosed with shell shock.