A Hybrid Approach to Error Detection in a Treebank: Its Impact on Manual Validation Time

Rahul Agarwal; Bharat Ram Ambati; Dipti Misra Sharma

doi:10.33011/lilt.v7i.1303

A Hybrid Approach to Error Detection in a Treebank

Its Impact on Manual Validation Time

Authors

Rahul Agarwal LTRC, IIIT-Hyderabad
Bharat Ram Ambati LTRC, IIIT-Hyderabad
Dipti Misra Sharma LTRC, IIIT-Hyderabad

DOI:

https://doi.org/10.33011/lilt.v7i.1303

Keywords:

treebank, error detection, Hindi

Abstract

Treebanks are a linguistic resource: a large database where the morphological, syntactic and lexical information for each sentence has been explicitly marked. The critical requirements of treebanks for various NLP activities (research and application) are well known. This also implies that treebanks need to be as error free as possible. However, manual validation of a treebank is very costly, both in terms of time and money. This paper describes an approach to automatically detect errors in a treebank after a complete manual annotation. Over and above improving an earlier error detection tool (Ambati et al. (2011)) for a Hindi treebank. We also present a user study to show that our system reduces the validation time signicantly while detecting 81.49% of the errors at the dependency level.

Downloads

Published

2012-01-01

How to Cite

Agarwal, R., Ambati, B. R., & Sharma, D. M. (2012). A Hybrid Approach to Error Detection in a Treebank: Its Impact on Manual Validation Time. Linguistic Issues in Language Technology, 7. https://doi.org/10.33011/lilt.v7i.1303

Download Citation

Issue

Vol. 7 (2012): Treebanks and Linguistic Theory

Section

Articles

License

This work is licensed under CC BY 4.0, which permits you to use, share, adapt, distribute, and reproduce it in any medium or format, provided you credit the original author(s) and source.