Normal view MARC view ISBD view

Tagset reductions in morphosyntactic tagging of Croatian texts / Agić, Željko ; Tadić, Marko ; Dovedan, Zdravko.

By: Agić, Željko.
Contributor(s): Tadić, Marko [aut] | Dovedan Han, Zdravko [aut].
Material type: ArticleArticleDescription: 289-298 str.Other title: Tagset Reductions in Morphosyntactic Tagging of Croatian Texts [Naslov na engleskom:].Subject(s): 5.04 | 6.03 | morphosyntactic tagging, part-of-speech tagging, stochastic tagger, Multext East tagset, tagset reductions, Croatian language hrv | morphosyntactic tagging, part-of-speech tagging, stochastic tagger, Multext East tagset, tagset reductions, Croatian language engOnline resources: Click here to access online In: 2nd International Conference The Future of Information Sciences (INFuture 2009) (3-6.11.2009. ; Zagreb, Hrvatska) The Future of Information Sciences: Digital Resources and Knowledge Sharing str. 289-298Stančić, Hrvoje ; Seljan, Sanja ; Bawden, David ; Lasić-Lazić, Jadranka ; Slavić, AidaSummary: Morphosyntactic tagging of Croatian texts is performed with stochastic taggers by using a language model built on a manually annotated corpus implementing the Multext East version 3 specifications for Croatian. Tagging accuracy in this framework is basically predefined, i.e. proportionally dependent of two things: the size of the training corpus and the number of different morphosyntactic tags encompassed by that corpus. Being that the 100 kw Croatia Weekly newspaper corpus by definition makes a rather small language model in terms of stochastic tagging of free domain texts, the paper presents an approach dealing with tagset reductions. Several meaningful subsets of the Croatian Multext-East version 3 morphosyntactic tagset specifications are created and applied on Croatian texts with the CroTag stochastic tagger, measuring overall tagging accuracy and F1-measures. Obtained results are discussed in terms of applying different reductions in different natural language processing systems and specific tasks defined by specific user requirements.
Tags from this library: No tags from this library for this title. Log in to add tags.
No physical items for this record

Morphosyntactic tagging of Croatian texts is performed with stochastic taggers by using a language model built on a manually annotated corpus implementing the Multext East version 3 specifications for Croatian. Tagging accuracy in this framework is basically predefined, i.e. proportionally dependent of two things: the size of the training corpus and the number of different morphosyntactic tags encompassed by that corpus. Being that the 100 kw Croatia Weekly newspaper corpus by definition makes a rather small language model in terms of stochastic tagging of free domain texts, the paper presents an approach dealing with tagset reductions. Several meaningful subsets of the Croatian Multext-East version 3 morphosyntactic tagset specifications are created and applied on Croatian texts with the CroTag stochastic tagger, measuring overall tagging accuracy and F1-measures. Obtained results are discussed in terms of applying different reductions in different natural language processing systems and specific tasks defined by specific user requirements.

Projekt MZOS 130-1300646-0645

Projekt MZOS 130-1300646-1776

ENG

There are no comments for this item.

Log in to your account to post a comment.

Powered by Koha

//