Towards a "living" HTR Evaluation Platform

The HTR field has a transparency problem: every project benchmarks its own models on its own data with its own metrics, making meaningful comparisons nearly impossible — just as large multimodal models arrive with bold claims of having solved historical text recognition. This roundtable confronts the problem openly, inviting developers, historians, archivists, and ML practitioners to work together on the foundations of a shared evaluation infrastructure. Discussion focuses on five pillars: curating a representative corpus of historical source types; defining a taxonomy of document difficulty; moving beyond CER/WER to multi-dimensional metrics that include layout analysis and semantic accuracy; accounting for environmental footprint and data sovereignty; and designing a platform that evolves continuously alongside the technology. The goal is not a finished proposal, but a concrete roadmap.