Learning Morphological Disambiguation Rules for Turkish
-
Deniz Yuret and Ferhan Ture (2006)
( PDF )
- Learning Morphological Disambiguation Rules for Turkish. In
Proceedings of the Human Language Technology Conference - North American Chapter of the
Association for Computational Linguistics Annual Meeting (HLT-NAACL
2006), June, 2006, New York City.
Abstract:
In this paper, we present a rule based model for morphological
disambiguation of Turkish. The rules are generated by a novel
decision list learning algorithm using supervised training.
Morphological ambiguity (e.g. lives = live+s or life+s) is a
challenging problem for agglutinative languages like Turkish where
close to half of the words in running text are morphologically
ambiguous. Furthermore, it is possible for a word to take an
unlimited number of suffixes, therefore the number of possible
morphological tags is unlimited. We attempted to cope with these
problems by training a separate model for each of the 126
morphological features recognized by the morphological analyzer.
The resulting decision lists independently vote on each of the
potential parses of a word and the final parse is selected based on
our confidence on these votes. The accuracy of our model (96%) is
slightly above the best previously reported results which use
statistical models. For comparison, when we train a single decision
list on full tags instead of using separate models on each feature
we get 91% accuracy.
- You can download the stand-alone Turkish morphological
disambiguator here. The conference
presentation is here.