Artificial intelligence has proven its prowess in areas like software development and content creation. Though, a recent study reveals its limitations when it comes to tackling advanced ancient questions.
A team of researchers developed a specialized benchmark, Hist-LLM, to evaluate the performance of leading large language models (LLMs) in answering complex historical queries. These models—OpenAI’s GPT-4, Meta’s Llama, and Google’s Gemini—were tested against the Seshat Global History Databank, a extensive repository of historical data named after the ancient Egyptian deity of wisdom.
The findings, showcased at the prestigious NeurIPS AI conference last month, were underwhelming. GPT-4 Turbo emerged as the top performer but achieved only 46% accuracy, barely surpassing random guesses. Researchers from the Complexity Science Hub (CSH) in Austria led the study, emphasizing the gap between AI’s capabilities and the demands of high-level historical analysis.
“The main takeaway from this study is that LLMs, while remarkable, still lack the depth of understanding required for advanced history. They’re great for basic facts, but when it comes to more nuanced, PhD-level historical inquiry, they’re not yet up to the task,” said Maria del Rio-Chanona, a co-author of the study and an associate professor of computer science at University College London.
The study included examples of questions where LLMs faltered. For instance, GPT-4 Turbo incorrectly asserted that scale armor existed in ancient Egypt during a specific era, even though this technology didn’t appear there until 1,500 years later. Similarly, the model erroneously claimed that ancient Egypt maintained a professional standing army during a particular period, likely due to biases in its training data.
“If you get told A and B 100 times, and C 1 time, and then get asked a question about C, you might just remember A and B and try to extrapolate from that,” del Rio-Chanona explained, highlighting a key limitation of LLMs in handling less prominent historical details.
The study also uncovered biases in the models’ performance, particularly in regions like sub-Saharan Africa, where accuracy lagged significantly. This suggests that the training data used for these models may lack sufficient representation of certain historical contexts.
Peter Turchin, the study’s lead researcher and a faculty member at CSH, noted that while LLMs are not yet ready to replace human historians, they hold promise as tools for assisting historical research. The team is refining their benchmark by incorporating more data from underrepresented regions and increasing the complexity of questions.
“while our results highlight areas where LLMs need improvement, they also underscore the potential for these models to aid in historical research,” the paper concludes.
How is artificial intelligence being used to analyze and interpret ancient texts?
Interview: exploring the Intersection of AI and Ancient History with Dr. Elena Sommerschield
Location: Virtual Studio,Archyde Newsroom
Date: January 19,2025
Editor (Archyde): Today,we have the privilege of speaking with dr. Elena Sommerschield,a leading researcher in the application of artificial intelligence to historical studies. Dr. Sommerschield’s work has been instrumental in unlocking ancient texts and redefining how we understand history. Welcome, Dr. Sommerschield!
Dr. Sommerschield: Thank you for having me. It’s a pleasure to be hear.
Editor: Your recent work has been making waves in both the tech and historical communities. Could you start by giving us an overview of how AI is being used to study ancient texts?
Dr.Sommerschield: Absolutely. The goal of our research is to design tools that enhance the capabilities of historians and archaeologists. AI, particularly neural networks, allows us to analyze vast archives of ancient texts at a scale and speed that would be unfeasible for humans alone. For example, we can identify patterns, connections, and even hidden meanings in fragmented or incomplete texts. This helps us reconstruct historical narratives with greater accuracy.
Editor: That’s interesting.Though,there’s been some discussion about the limitations of AI when it comes to tackling advanced ancient questions. Could you shed some light on this?
Dr. Sommerschield: Certainly.While AI is incredibly powerful, it’s not a silver bullet. One of the key challenges is context. Ancient texts often rely on cultural,linguistic,and historical nuances that AI can struggle to understand without extensive human guidance. As an example, a word or phrase might have different meanings depending on the time period or region. AI can flag potential connections,but it’s up to human researchers to interpret them accurately.
Another limitation is the quality of the data. Many ancient texts are fragmented or damaged, which can lead to incomplete or biased outputs from AI models. So,while AI is a valuable tool,it’s most effective when used in collaboration with human expertise.
Editor: That’s a crucial point. How do you see the relationship between AI and human researchers evolving in the future?
Dr. Sommerschield: I believe we’re moving toward a more symbiotic relationship.AI can handle the heavy lifting—processing large datasets, identifying patterns, and suggesting hypotheses—while human researchers provide the critical thinking, contextual understanding, and creativity needed to make sense of the results. Together, they can achieve far more than either could alone.
For example, in one of our projects, AI helped us identify a previously unknown link between two ancient civilizations. however, it was our team of historians who contextualized this connection and explained its significance.
Editor: That’s a great example of collaboration. What are some of the ethical considerations in using AI for historical research?
Dr. Sommerschield: Ethical concerns are paramount. One major issue is the potential for bias in AI models. If the training data is skewed or incomplete, the outputs may perpetuate historical inaccuracies or misconceptions. We need to ensure that the datasets we use are as comprehensive and representative as possible.
Additionally, there’s the question of transparency. researchers must be clear about how AI is being used and what its limitations are. we don’t want to give the impression that AI is infallible or that it can replace human judgment.
Editor: Those are vital considerations. Looking ahead, what excites you most about the future of AI in historical research?
Dr. Sommerschield: The potential to uncover new insights is incredibly exciting. AI has already helped us rediscover lost texts and reinterpret existing ones. As the technology continues to evolve, I believe we’ll see even more groundbreaking discoveries.
But beyond that, I’m excited about the democratization of knowlege.AI tools can make historical research more accessible to people around the world, allowing a broader audience to engage with and contribute to our understanding of the past.
Editor: That’s a hopeful vision. Thank you, Dr. Sommerschield, for sharing your insights with us today. It’s clear that AI is transforming the way we study history,and your work is at the forefront of this exciting field.
Dr. Sommerschield: Thank you. It’s been a pleasure discussing these ideas with you.
End of Interview
Stay tuned for more expert insights on the intersection of technology and history, only on Archyde.