A Conditional Random Field Approach for Named Entity Recognition in Bengali and Hindi

Authors

  • Asif Ekbal Department of Computational Linguistics, University of Heidelberg
  • Sivaji Bandyopadhyay Department of Computer Science and Engineering, Jadavpur University

DOI:

https://doi.org/10.33011/lilt.v2i.1203

Keywords:

named entity, named entity recognition

Abstract

This paper describes the development of Named Entity Recognition (NER) systems for two leading Indian languages, namely Bengali and Hindi, using the Conditional Random Field (CRF) framework. The system makes use of different types of contextual information along with a variety of features that are helpful in predicting the different named entity (NE) classes. This set of features includes language independent as well as language dependent components. We have used the annotated corpora of 122,467 tokens for Bengali and 502,974 tokens for Hindi tagged with a tag set of twelve different NE classes, defined as part of the IJCNLP-08 NER Shared Task for South and South East Asian Languages (SSEAL). We have considered only the tags that denote person names, location names, organization names, number expressions, time expressions and measurement expressions. A number of experiments have been carried out in order to find out the most suitable features for NER in Bengali and Hindi. The system has been tested with the gold standard test sets of 35K for Bengali and 50K tokens for Hindi. Evaluation results in overall f-score values of 81.15% for Bengali and 78.29% for Hindi for the test sets. 10-fold cross validation tests yield f-score values of 83.89% for Bengali and 80.93% for Hindi. ANOVA analysis is performed to show that the performance improvement due to the use of language dependent features is statistically significant.

Downloads

Published

2009-11-01

How to Cite

Ekbal, A., & Bandyopadhyay, S. (2009). A Conditional Random Field Approach for Named Entity Recognition in Bengali and Hindi. Linguistic Issues in Language Technology, 2. https://doi.org/10.33011/lilt.v2i.1203

Issue

Section

Articles