Harmonising Automatic Text Recognition Workflows

Video recordings of the DHIP/IHA Tutorial Series on Automatic Text Recognition in the Humanities (Deutsches Historisches Institut Paris / Institut Historique Allemand | Spring 2024)
Production and Editing by Paul Ramisch | Co-Editors-in-Chief: Anne Baillot and Mareike König

This Tutorial Series on Automatic Text Recognition (ATR) in the Humanities covers the full ATR workflow for research projects, teaching when and how to use ATR technology to effectively extract text from images.
From getting started, acquiring images, optimising images, analysing layouts, recognising text and training models to ensuring quality and exploring end formats and reusability, each of the six videos of the series guides interested users through a crucial step of the whole ATR process.

Explore our video tutorials on Automatic Text Recognition (ATR) and learn how to efficiently extract full text from heritage material images.
Perfect for researchers, librarians and archivists, these resources not only enhance your archival research and preservation efforts but also unlock the potential for computational analysis of your sources.
Our six videos guide you through the entire workflow.
This is the series teaser.

Teaser: Harmonising Automatic Text Recognition Workflows
Speakers: Pauline Spychala, David Lassner, Hippolyte Souvay,
Hugo Scheithauer, Floriane Chiffoleau, Sarah Ondraszek
YouTube DHIP/IHA Channel | 19/06/2024

Kick off your journey into Automatic Text Recognition with our introductory tutorial video.
This session outlines the entire workflow of Humanities research projects utilising ATR to extract full text from scanned images.
We provide an overview of each step in the process and introduce subsequent tutorials that delve deeper into these steps.
Additionally, a ‘How to get started with ATR’ road map linked below will guide you through important questions and give you basic orientation before starting an ATR project.

Video 1: How To Get Started
Script: Ariane Pinche, Pauline Spychala
Speaker: Pauline Spychala
YouTube DHIP/IHA Channel | 17/04/2024

Discover the foundational steps of Automatic Text Recognition in our second tutorial video, focused on acquiring images for ATR.
This video explores where and how to find, create and collect images of textual material, a crucial initial step in any ATR-based research.
Learn about the typical methods for obtaining high-quality scanned images, setting the stage for successful text recognition processes.

Video 2: Get Images
Script: Anna Busch, David Lassner, Aneta Plzáková
Speaker: David Lassner
YouTube DHIP/IHA Channel | 04/05/2024

Join us in our third ATR tutorial video, where we delve into the critical process of image optimization for Automatic Text Recognition.
This video covers the scanning process, essential considerations for high-quality scans and key pre-processing steps like cropping and dewarping.
We also discuss common challenges encountered during the pre-processing stage of ATR, preparing you to handle them effectively.

Video 3: Image Optimisation
Script: Hippolyte Souvay, Larissa Will
Speaker: Hippolyte Souvay
YouTube DHIP/IHA Channel | 06/05/2024

Discover how computers perform layout analysis in our fourth ATR tutorial video.
This video explains how ATR technology identifies the structural elements of a document and localises text lines, processes also known as segmentation, zoning or document analysis.
Gain insights into optical layout analysis, essential for efficiently processing and understanding heritage texts with ATR.

Video 4: Layout Analysis
Script: Alix Chagué, Hugo Scheithauer
Speaker: Hugo Scheithauer
YouTube DHIP/IHA Channel | 05/05/2024

Explore the core concepts of text recognition and model training in our fifth ATR tutorial video.
This session breaks down the essentials of creating accurate models, including understanding ground truth data.
Perfect for enhancing your ATR skills, the video equips you with the knowledge to improve text extraction from heritage materials.

Video 5: Text Recognition and Post-ATR Correction
Script: Floriane Chiffoleau, Sarah Ondraszek
Speaker: Floriane Chiffoleau
YouTube DHIP/IHA Channel | 06/05/2024

This concluding video of our ATR tutorial series focuses on integrating Open Science standards in Humanities research.
Learn how to apply these principles for transparent and reproducible results and discover the best formats for exporting and reusing your transcriptions.
Ensure your ATR project results are accessible and ready for future scholarly use.

Video 6: End Formats and Reusability
Script: Floriane Chiffoleau, Sarah Ondraszek
Speaker: Sarah Ondraszek
YouTube DHIP/IHA Channel | 06/05/2024