Two input files for participants (TIFF image and TXT/PNG file) are provided for each image of the test dataset.
1- TXT: The input TXT file only contains the coordinates of the selected foreground pixels to be classified. The participants will return an output file that is a copy of the input TXT file filled out with the class label for each foreground pixel. Hence, the output for participants should be in the TXT format (e.g. for input image “T0000007.TIFF”, the output should be “T0000007.TXT”). The ground truth and the output files have similar structure. Each line of the ground truth file and the output file contains the following three values: the coordinates of the selected foreground pixel and its corresponding label class representing the content type in the analyzed book.
2- PNG: The input PNG file represents a pixel-lalebed image with the selected foreground pixels coloured in white. The participants should provide as an output a pixel-labeled image with respect of the BGR values defined in the training files. Hence, the output for participants should be in the PNG format (e.g. for input image “T0000007.TIFF”, the output should be “T0000007.PNG”).
The list of the six content classes involved in the HBA competition is described below.
TXT: Class label | PNG: Class color (BGR values) | Description |
1 | Blue (255, 0, 0) | Graphics |
2 | Green (0, 255, 0) | Normal text |
3 | Red (0, 0, 255) | Capitalized text |
4 | Cyan (255, 255, 0) | Handwritten text |
5 | Yellow (0, 255, 255) | Italic text |
6 | Magenta (255, 0, 255) | Footnote text |
The number of content classes varies from a book to another one (i.e. its value may range from 2 to 6).
Regarding the ground truth in the TXT version, the pixels representing the graphical content are annotated with label values equal to 1, while the pixels representing the textual content are annotated with label values different to 1 (i.e. label values may be 2, 3, 4, 5 or 6).
Two nested challenges are proposed in the HBA competition:
1- In the first challenge, our goal consists in evaluating the capabilities of the participating methods to distinguish between text and graphic. In the evaluation task, we consider two classes, graphics (label value = 1) and text (all text classes representing different font types [2-6] are considered as one single class).
2- In the second challenge, we aim at assessing the capabilities of the participating methods to separate different content types even font types. In the evaluation task, we consider each content type has its proper class (i.e. all six classes [1-6] are included). Each textual class in the ground truth of the training images representing a specific text font has to be considered as separate class.
The following illustrates the input format specifications of an image sample of the HBA 1.0 dataset which is already available for download.
Training file | Test file |
– Input image (.TIFF) – Training file (.PNG) – Training file (.TXT) |
– Input image (.TIFF) – Test file (.PNG) – Test file (.TXT) |
In the TXT version, any pixel with more than one class label must be ignored in the training and the evaluation tasks. The pixels having more than one class label will not be treated neither in the evaluation dataset nor in the training dataset. In the HBA competition, the training and the evaluation tasks are N-way classifications without allowing multiple class labels per pixel location.