Adding linguistic information to parsed corpora

Susan Pintzuk

doi:10.33011/lilt.v18i.1435

Adding linguistic information to parsed corpora

Authors

Susan Pintzuk University of York

DOI:

https://doi.org/10.33011/lilt.v18i.1435

Abstract

No matter how comprehensively corpus builders design their annotation schemes, users frequently find that information is missing that they need for their research. In this methodological paper I describe and illustrate five methods of adding linguistic information to corpora that have been morphosyntactically annotated (=parsed) in the style of Penn treebanks. Some of these methods involve manual operations; some are executed by CorpusSearch functions; some require a combination of manual and automated procedures. Which method is used depends almost entirely on the type of information to be added and the goals of the user. Of course, the main goal, regardless of method, is to record within the corpus additional information that can be used for analysis and also retained through further searches and data processing.

Downloads

Published

2019-07-01

How to Cite

Pintzuk, S. (2019). Adding linguistic information to parsed corpora. Linguistic Issues in Language Technology, 18. https://doi.org/10.33011/lilt.v18i.1435

Download Citation

Issue

Vol. 18 (2019): Exploiting Parsed Corpora: Applications in Research, Pedagogy, and Processing

Section

Articles

License

This work is licensed under CC BY 4.0, which permits you to use, share, adapt, distribute, and reproduce it in any medium or format, provided you credit the original author(s) and source.

Adding linguistic information to parsed corpora

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Information