Download PDFOpen PDF in browserGrammatical Disambiguation in the Tatar National Corpus8 pages•Published: November 28, 2016AbstractThis paper concerns the issues of grammatical ambiguity in the Tatar National Corpus and the possiblities for automation of the disambiguation process in the corpus. Grammatical ambiguity is widely represented in agglutinative languages like Turkic or Finno-Ugric. In order to build the grammatically disambiguated subcorpus, wе have developed a special software module which searches for ambiguous tokens in the corpus, collects statistical information and allows creating and implementing the formal disambiguation rules for different ambiguity types. Disambiguation in the corpus is based on the context-oriented classification of ambiguity types which has been carried out on statistical corpus data in the Tatar language for the first time. We can say that we use the corpus as a source of our research and at the same time as a destination for implementing the results. Estimated cumulative effect of disambiguation of the identified frequent ambiguity types in the Tatar National Corpus can be up to 50%.Keyphrases: agglutinative languages, disambiguation, linguistic corpus, morphology, tatar language In: Antonio Moreno Ortiz and Chantal Pérez-Hernández (editors). CILC2016. 8th International Conference on Corpus Linguistics, vol 1, pages 228-235.
|