Inproceedings,

Document Domain Randomization for Deep Learning Document Layout Extraction

M. Ling, J. Chen, T. Möller, P. Isenberg, T. Isenberg, M. Sedlmair, R. Laramee, H. Shen, J. Wu, and C. Giles.
Document Analysis and Recognition (ICDAR), page 497--513. Springer International Publishing, (2021)
DOI: 10.1007/978-3-030-86549-8_32

Abstract

We present document domain randomization (DDR), the first successful transfer of convolutional neural networks (CNNs) trained only on graphically rendered pseudo-paper pages to real-world document segmentation. DDR renders pseudo-document pages by modeling randomized textual and non-textual contents of interest, with user-defined layout and font styles to support joint learning of fine-grained classes. We demonstrate competitive results using our DDR approach to extract nine document classes from the benchmark CS-150 and papers published in two domains, namely annual meetings of Association for Computational Linguistics (ACL) and IEEE Visualization (VIS). We compare DDR to conditions of style mismatch, fewer or more noisy samples that are more easily obtained in the real world. We show that high-fidelity semantic information is not necessary to label semantic classes but style mismatch between train and test can lower model accuracy. Using smaller training samples had a slightly detrimental effect. Finally, network models still achieved high test accuracy when correct labels are diluted towards confusing labels; this behavior hold across several classes.

BibTeX key: ling2021icdar
entry type: inproceedings
booktitle: Document Analysis and Recognition (ICDAR)
year: 2021
pages: 497--513
publisher: Springer International Publishing
isbn: 978-3-030-86549-8
DOI: 10.1007/978-3-030-86549-8_32
url: https://arxiv.org/abs/2105.14931

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@inproceedings{ling2021icdar, abstract = {We present document domain randomization (DDR), the first successful transfer of convolutional neural networks (CNNs) trained only on graphically rendered pseudo-paper pages to real-world document segmentation. DDR renders pseudo-document pages by modeling randomized textual and non-textual contents of interest, with user-defined layout and font styles to support joint learning of fine-grained classes. We demonstrate competitive results using our DDR approach to extract nine document classes from the benchmark CS-150 and papers published in two domains, namely annual meetings of Association for Computational Linguistics (ACL) and IEEE Visualization (VIS). We compare DDR to conditions of style mismatch, fewer or more noisy samples that are more easily obtained in the real world. We show that high-fidelity semantic information is not necessary to label semantic classes but style mismatch between train and test can lower model accuracy. Using smaller training samples had a slightly detrimental effect. Finally, network models still achieved high test accuracy when correct labels are diluted towards confusing labels; this behavior hold across several classes.}, added-at = {2021-12-09T09:48:00.000+0100}, author = {Ling, Meng and Chen, Jian and M{\"o}ller, Torsten and Isenberg, Petra and Isenberg, Tobias and Sedlmair, Michael and Laramee, Robert S and Shen, Han-Wei and Wu, Jian and Giles, C Lee}, biburl = {https://puma.ub.uni-stuttgart.de/bibtex/25a26e9bfa68d27a00d9f7b069b59a64d/christinawarren}, booktitle = {Document Analysis and Recognition (ICDAR)}, doi = {10.1007/978-3-030-86549-8_32}, interhash = {92d774ec21a97a026ffea4e322985b5c}, intrahash = {5a26e9bfa68d27a00d9f7b069b59a64d}, isbn = {978-3-030-86549-8}, keywords = {2021 FFG ViSciPub visus visus:sedlmaml}, pages = {497--513}, publisher = {Springer International Publishing}, timestamp = {2021-12-09T08:48:00.000+0100}, title = {Document Domain Randomization for Deep Learning Document Layout Extraction}, url = {https://arxiv.org/abs/2105.14931}, year = 2021 }

PUMA

Document Domain Randomization for Deep Learning Document Layout Extraction

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on