Does Transcription Quality Matter? Automated Thematic Indexing (Annif) of Bernese Legal Texts applied to different HTR-models
Does better transcription quality actually lead to better automated indexing? With 4,930 Bernese legal texts and the Annif subject-indexing tool, this paper puts the question to the test — comparing 2020 HTR output with improved 2024 TrOCR transcriptions, and examining how transcription format (diplomatic vs. running text) affects the accuracy of thematic metadata assignment. The results have real implications for how archives balance accuracy and discoverability.