Linguistic Annotations for a Diachronic Corpus of German
DOI:
https://doi.org/10.33011/lilt.v7i.1271Keywords:
treebank, German, linguistic annotationAbstract
This paper describes the TüBa-D/DC, a diachronic corpus of German that uses selected materials from the German Gutenberg Project and enriches them with different linguistic annotation layers, including part-of-speech, lemmata, and constituent structure. Linguistic annotation is performed automatically by using statistical tools that have been trained with data from the Tübinger Baumbank des Deutschen (TüBaD/Z). In order to assess the annotation quality, an evaluation of the POS tagging is performed on the basis of a data sample of texts that range from the 13th to the 20th century. The paper concludes with a description of three different query mechanisms provided for the user.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under CC BY 4.0, which permits you to use, share, adapt, distribute, and reproduce it in any medium or format, provided you credit the original author(s) and source.