Linguistic Annotations for a Diachronic Corpus of German

Authors

  • Erhard Hinrichs Seminar für Sprachwissenschaft, Eberhard Karls Universität Tübingen
  • Thomas Zastrow Seminar für Sprachwissenschaft, Eberhard Karls Universität Tübingen

DOI:

https://doi.org/10.33011/lilt.v7i.1271

Keywords:

treebank, German, linguistic annotation

Abstract

This paper describes the TüBa-D/DC, a diachronic corpus of German that uses selected materials from the German Gutenberg Project and enriches them with different linguistic annotation layers, including part-of-speech, lemmata, and constituent structure. Linguistic annotation is performed automatically by using statistical tools that have been trained with data from the Tübinger Baumbank des Deutschen (TüBaD/Z). In order to assess the annotation quality, an evaluation of the POS tagging is performed on the basis of a data sample of texts that range from the 13th to the 20th century. The paper concludes with a description of three different query mechanisms provided for the user.

Downloads

Published

2012-01-01

How to Cite

Hinrichs, E., & Zastrow, T. (2012). Linguistic Annotations for a Diachronic Corpus of German. Linguistic Issues in Language Technology, 7. https://doi.org/10.33011/lilt.v7i.1271