Bootstrapping the Development of an HPSG-based Treebank for Persian
DOI:
https://doi.org/10.33011/lilt.v7i.1301Keywords:
HSPG, Persian, CLaRK, annotationAbstract
In this paper, we describe an ongoing research to develop an HPSG-based treebank for Persian. To this aim, we use a bootstrapping approach for the data annotation. In the first step, a set of seed rules are defined as regular expressions in the CLaRK system. Then, the data is shallow processed with this set of rules. In the next step, a human annotator completes the annotation of sentences manually. To increase automatic annotation, we extract the manual applied rules and iteratively augment the seed rules with the rules applied frequently in the manual annotation. Our experiment in building the Persian treebank which currently contains 1000 sentences shows that the proposed method reduces human intervention from 74.05% in first iterations to 39.01% in last iterations.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under CC BY 4.0, which permits you to use, share, adapt, distribute, and reproduce it in any medium or format, provided you credit the original author(s) and source.