Q1: Can we download all the images of the dataset?
At the first stage of the HBA competition, only a sample dataset along with their ground truth in the TXT/PNG format are made available. The sample dataset gives an overview of the data to be handled in the evaluation dataset. The participants are free to use the sample dataset for training, testing or any other purpose related to the competition. Only the participants that are registered will have access to the test dataset.
Q2: Will a limited number of manually annotated pages (30/40 pages) per book be available for the evaluation dataset as for the sample dataset?
The HBA competition aims at evaluating methods which would automatically annotate an important number of book pages, based on a limited number of manually annotated pages of the same book. The provided limited number of manually annotated pages constitutes the training image dataset. It is, therefore, ensured that each class of content type will be represented in the set of the training pages for each book. It is worth pointing out that the content classes in the HBA 1.0 dataset vary from one book to another one and have very different headcounts. Indeed, the textual content is predominant in monographs, compared to the graphical content. Moreover, among the textual content a great majority represent the body text while other character fonts are more marginal. There is surely a great deal of imbalanced headcounts between classes in the same book. The class headcounts in the training dataset will not thereby be similar to the test dataset. Indeed, the minority classes will be adequately represented in the training dataset in order to ensure an appropriate learning task. However, unlike the minority classes, the majority classes will clearly be less represented in the training dataset in comparison with the class headcounts in the test dataset. These requirements have been satisfied in the selection of the training pages in the sample dataset. Similarly, these rules will also be applied to the selection of the training pages in the evaluation dataset.
Q3: Do you have a comprehensive list of the six classes involved in the competition, and a mapping of these classes to their numeric annotations in the ground-truth dataset?
The list of the six content classes involved in the HBA competition is described below.
TXT: Class label | PNG: Class color(BGR values) | Description |
1 | Blue (255, 0, 0) | Graphics |
2 | Green (0, 255, 0) | Normal text |
3 | Red (0, 0, 255) | Capitalized text |
4 | Cyan (255, 255, 0) | Handwritten text |
5 | Yellow (0, 255, 255) | Italic text |
6 | Magenta (255, 0, 255) | Footnote text |
The number of content classes varies from a book to another one (i.e. its value may range from 2 to 6).
Regarding the ground truth in the TXT version, the pixels representing the graphical content are annotated with label values equal to 1, while the pixels representing the textual content are annotated with label values different to 1 (i.e. label values may be 2, 3, 4, 5 or 6).
Q4: Does the first challenge consists of two classes, graphics (class 1) and font (all other classes as one single class)?
In the first challenge, our goal consists in evaluating the capabilities of the participating methods to distinguish between text and graphic. In the evaluation task, we consider two classes, graphics (label value = 1) and text (all text classes representing different font types [2-6] are considered as one single class).
Q5: For the second challenge, are all six classes [1-6] included in the second challenge, or does the second challenge consist only of classes [2-6] (text classes)?
In the second challenge, we aim at assessing the capabilities of the participating methods to separate different content types even font types. In the evaluation task, we consider each content type has its proper class (i.e. all six classes [1-6] are included). Each textual class in the ground truth of the training images representing a specific text font has to be considered as separate class.
Q6: How will the doubly labeled pixel samples defined in the ground truth be treated in the evaluation dataset? How should the doubly labeled pixel samples defined in the ground truth be treated in the training dataset? Should we predict multiple labels for these locations? Should we ignore them? Are the training and evaluation tasks N-way classifications without allowing multiple class labels per pixel location?
Any pixel with more than one class label must be ignored in the training and the evaluation tasks. The pixels having more than one class label will not be treated neither in the evaluation dataset nor in the training dataset. Therefore, the training and the evaluation tasks are effectively N-way classifications without allowing multiple class labels per pixel location.
Q7: Why there are empty files in the test folder of many books of the HBA 1.0 dataset?
Some TXT files in the test folder are empty because their corresponding TIFF images do not have annotated pixels belonging to the foreground layer (handwritten or printed text or graphics) in their ground truth. These empty files usually correspond to images of cover pages or empty pages in the digitized book.
Q8: Should I label normal text in Book 7 as 2 or as 3 in what I eventually submit?
You should label each book page of the evaluation dataset according to the provided training dataset.
In the case of Book 7, the pixels representing the normal text are annotated with label values equal to 2 (green), while the pixels representing the capitalized text are annotated with label values equal to 3 (red).
Q9: Should I generate as an output a TXT or PNG file ?
The participants are free to choose between the TXT or PNG version and send us the results in the TXT or PNG format.