Abstract: |
The extraction of named entities from unstructured text is a crucial component in numerous Natural Language Processing (NLP) applications such as information retrieval, question answering, machine translation, to name but a few. Named-entity Recognition (NER) aims at locating proper nouns from unstructured text and classifying them into a predefined set of types, such as persons, locations, and organizations. There has been extensive research on improving the accuracy of NER in English text. For other languages such as Arabic, extracting Named-entities is quite challenging due to its morphological structure. In this paper, we introduce ArabiaNer, a system employing Conditional Random Field (CRF) learning algorithm with extensive feature engineering steps to effectively extract Arabic named Entities. ArabiaNer produced state-of-the-art results with f1-score of 91.31% when applied on the ANERcrop dataset. |