Bootstrapping the Development of an HPSG-based Treebank for Persian

Authors

  • Masood Ghayoomi Freie Universität Berlin

DOI:

https://doi.org/10.33011/lilt.v7i.1301

Keywords:

HSPG, Persian, CLaRK, annotation

Abstract

In this paper, we describe an ongoing research to develop an HPSG-based treebank for Persian. To this aim, we use a bootstrapping approach for the data annotation. In the first step, a set of seed rules are defined as regular expressions in the CLaRK system. Then, the data is shallow processed with this set of rules. In the next step, a human annotator completes the annotation of sentences manually. To increase automatic annotation, we extract the manual applied rules and iteratively augment the seed rules with the rules applied frequently in the manual annotation. Our experiment in building the Persian treebank which currently contains 1000 sentences shows that the proposed method reduces human intervention from 74.05% in first iterations to 39.01% in last iterations. 

Downloads

Published

2012-01-01

How to Cite

Ghayoomi, M. (2012). Bootstrapping the Development of an HPSG-based Treebank for Persian. Linguistic Issues in Language Technology, 7. https://doi.org/10.33011/lilt.v7i.1301