Constructing Integrated Corpus and Lexicon Models for Multi-Layer Annotation in OWL DL

Authors

  • Aljoscha Burchardt Technische Universität Darmstadt
  • Sebastian Padó Stanford University
  • Dennis Spohr Universität Stuttgart
  • Anette Frank Universität Heidelberg
  • Ulrich Heid Universität Stuttgart

DOI:

https://doi.org/10.33011/lilt.v1i.1191

Keywords:

corpus linguistics, lexicon

Abstract

We present a general approach to formally modelling corpora with multi-layered annotation in a typed logical representation language, OWL DL. By defining abstractions over the corpus data, we can generalise from a large set of individual corpus annotations, thereby inducing a lexicon model. The resulting combined corpus and lexicon model can be interpreted as a graph structure that offers flexible querying functionality beyond current XML-based query languages. Its powerful methods for characterising and checking consistency can be used for incremental model refinement. In addition, the formalisation in a graph-based structure offers the means of defining flexible lexicon views over the corpus data. These views can be tailored for linguistic inspection or to define clean interfaces with other linguistic resources. We illustrate our approach by applying it to the syntactically and semantically annotated SALSA/TIGER corpus, a collection of German newspaper text.

Downloads

Published

2008-06-01

How to Cite

Burchardt, A., Padó, S., Spohr, D., Frank, A., & Heid, U. (2008). Constructing Integrated Corpus and Lexicon Models for Multi-Layer Annotation in OWL DL. Linguistic Issues in Language Technology, 1. https://doi.org/10.33011/lilt.v1i.1191

Issue

Section

Articles