Normal view MARC view ISBD view

Corpus vs. Lexicon Supervision in Morphosyntactic Tagging: the Case of Slovene / Nikola Ljubešić, ; Tomaž Erjavec

By: Ljubešić, Nikola, informatičar.
Contributor(s): Erjavec, Tomaž [aut].
Material type: ArticleArticlePublisher: 2016Description: str.Other title: Corpus vs. Lexicon Supervision in Morphosyntactic Tagging: the Case of Slovene [Naslov na engleskom:].Subject(s): 5.04 | Part-of-Speech tagging; evaluation; Slavic languages | Part-of-Speech tagging; evaluation; Slavic languagesOnline resources: Elektronička verzija In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) Language Resources and Evaluation Conference, LREC (10 ; 2016 ; Portorož)Summary: In this paper we present a tagger developed for inflectionally rich languages for which both a training corpus and a lexicon are available. We do not constrain the tagger by the lexicon entries, allowing both for lexicon incompleteness and noisiness. By using the lexicon indirectly through features we allow for known and unknown words to be tagged in the same manner. We test our tagger on Slovene data, obtaining a 25% error reduction of the best previous results both on known and unknown words. Given that Slovene is, in comparison to some other Slavic languages, a well-resourced language, we perform experiments on the impact of token (corpus) vs. type (lexicon) supervision, obtaining useful insights in how to balance the effort of extending resources to yield better tagging results.
Tags from this library: No tags from this library for this title. Log in to add tags.
No physical items for this record

In this paper we present a tagger developed for inflectionally rich languages for which both a training corpus and a lexicon are available. We do not constrain the tagger by the lexicon entries, allowing both for lexicon incompleteness and noisiness. By using the lexicon indirectly through features we allow for known and unknown words to be tagged in the same manner. We test our tagger on Slovene data, obtaining a 25% error reduction of the best previous results both on known and unknown words. Given that Slovene is, in comparison to some other Slavic languages, a well-resourced language, we perform experiments on the impact of token (corpus) vs. type (lexicon) supervision, obtaining useful insights in how to balance the effort of extending resources to yield better tagging results.

Projekt MZOS projekt

ENG

There are no comments for this item.

Log in to your account to post a comment.

Powered by Koha

//