"It Depends, But Now We Can Measure Why": Benchmarking Specialised and General-Purpose AI on Humanities Tasks

"It depends" is the honest answer to most questions about AI tool choice — but until now, there has been no rigorous way to measure what it actually depends on. The RISE Humanities Data Benchmark changes that, providing an extensible framework that evaluates specialised tools like Transkribus and general-purpose LLMs side by side on real humanities tasks: from HTR of 15th-century manuscripts to structured table extraction from 20th-century personnel cards. With eleven diverse datasets and transparent, expert-defined ground truth, the benchmark gives project teams the evidence they need to make informed decisions about tools, accuracy, and cost per document.