Beyond Linear Text: Training Transkribus Models on Multi-Alphabet Tabular Records from Transylvania

Five alphabets, three languages, one complex tabular source. Transylvanian parish registers from the 19th century are a formidable challenge for HTR — and this talk presents the latest results: a Hungarian model achieving 2% CER and a new Romanian model at 8% CER on 600,000 words, with an expanding training corpus pushing toward even greater accuracy.

Part of

Panel 2 · From Records to Rows