Linguistic Annotations for a Diachronic Corpus of German

Erhard Hinrichs; Thomas Zastrow

doi:10.33011/lilt.v7i.1271

Linguistic Annotations for a Diachronic Corpus of German

Authors

Erhard Hinrichs Seminar für Sprachwissenschaft, Eberhard Karls Universität Tübingen
Thomas Zastrow Seminar für Sprachwissenschaft, Eberhard Karls Universität Tübingen

DOI:

https://doi.org/10.33011/lilt.v7i.1271

Keywords:

treebank, German, linguistic annotation

Abstract

This paper describes the TüBa-D/DC, a diachronic corpus of German that uses selected materials from the German Gutenberg Project and enriches them with different linguistic annotation layers, including part-of-speech, lemmata, and constituent structure. Linguistic annotation is performed automatically by using statistical tools that have been trained with data from the Tübinger Baumbank des Deutschen (TüBaD/Z). In order to assess the annotation quality, an evaluation of the POS tagging is performed on the basis of a data sample of texts that range from the 13th to the 20th century. The paper concludes with a description of three different query mechanisms provided for the user.

Downloads

Published

2012-01-01

How to Cite

Hinrichs, E., & Zastrow, T. (2012). Linguistic Annotations for a Diachronic Corpus of German. Linguistic Issues in Language Technology, 7. https://doi.org/10.33011/lilt.v7i.1271

Download Citation

Issue

Vol. 7 (2012): Treebanks and Linguistic Theory

Section

Articles

License

This work is licensed under CC BY 4.0, which permits you to use, share, adapt, distribute, and reproduce it in any medium or format, provided you credit the original author(s) and source.

Linguistic Annotations for a Diachronic Corpus of German

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Information