A Hybrid Approach to Error Detection in a Treebank

Its Impact on Manual Validation Time

Authors

  • Rahul Agarwal LTRC, IIIT-Hyderabad
  • Bharat Ram Ambati LTRC, IIIT-Hyderabad
  • Dipti Misra Sharma LTRC, IIIT-Hyderabad

DOI:

https://doi.org/10.33011/lilt.v7i.1303

Keywords:

treebank, error detection, Hindi

Abstract

Treebanks are a linguistic resource: a large database where the morphological, syntactic and lexical information for each sentence has been explicitly marked. The critical requirements of treebanks for various NLP activities (research and application) are well known. This also implies that treebanks need to be as error free as possible. However, manual validation of a treebank is very costly, both in terms of time and money. This paper describes an approach to automatically detect errors in a treebank after a complete manual annotation. Over and above improving an earlier error detection tool (Ambati et al. (2011)) for a Hindi treebank. We also present a user study to show that our system reduces the validation time signicantly while detecting 81.49% of the errors at the dependency level.

Downloads

Published

2012-01-01

How to Cite

Agarwal, R., Ambati, B. R., & Sharma, D. M. (2012). A Hybrid Approach to Error Detection in a Treebank: Its Impact on Manual Validation Time. Linguistic Issues in Language Technology, 7. https://doi.org/10.33011/lilt.v7i.1303