Training HTR on Manuscripts That Refuse to Cooperate: Ink, Insects, and Adverse Conditions

Ink corrosion, insect damage, severe staining — the 18th-century notarial books from Salvador da Bahia have survived against the odds, and now face a new challenge: being read by an AI trained mostly on well-preserved manuscripts. This talk confronts the 'preservation privilege' embedded in HTR benchmarks and asks what it takes to build models that can read what time has almost erased.