The dataset associated to this competition which is called the HBA 1.0 dataset have been collected from the French digital library Gallica.
The HBA 1.0 dataset is composed of 4,436 real scanned ground truthed historical document images from 11 books (5 manuscripts and 6 printed books) in different languages and scripts published between the 13th and 19th centuries.
The following figures illustrate few samples of pages for each book in the HBA 1.0 dataset.
The links associated to the different “Book Id.” correspond to the URL links pointing to the selected historical books in the French digital library Gallica (only low resolution images are publicly available online).