Normal view MARC view ISBD view

Investigating language independence in HMM PoS/MSD-tagging / Agić, Željko ; Tadić, Marko ; Dovedan, Zdravko.

By: Agić, Željko.
Contributor(s): Tadić, Marko [aut] | Dovedan Han, Zdravko [aut].
Material type: materialTypeLabelArticleDescription: 657-662.Other title: Investigating Language Independence in HMM PoS/MSD-Tagging [Naslov na engleskom:].Subject(s): 2.09 | 5.04 | 6.03 | language independence, part-of-speech tagging, morphosyntactic tagging, hidden Markov models hrv | language independence, part-of-speech tagging, morphosyntactic tagging, hidden Markov models engOnline resources: Click here to access online In: 30th International Conference on Information Technology Interfaces (ITI 2008) (23-26.06.2008. ; Cavtat / Dubrovnik, Hrvatska) Proceedings of the 30th International Conference on Information Technology Interfaces str. 657-662Lužar-Stiffler, Vesna ; Hljuz Dobrić, Vesna ; Bekić, ZoranSummary: The paper presents an investigation of functional dependencies in morphosyntactic tagging using hidden Markov models. Starting from a well known fact that the HMM tagging paradigm relies on lexical knowledge acquired from training corpora and stored in form of transition and emission matrices, also called a language model, in the experiment, we apply the TnT trigram tagger on creating language models for seven different languages from the MULTEXT East v3 project translations of George Orwell’ s 1984. – Czech, Estonian, Hungarian, Romanian, Serbian, Slovene and original English version. We then use these language models in the tagging procedure and obtain details on various relations between training corpora statistics, training outputs and outputs of the tagging procedure.
Tags from this library: No tags from this library for this title. Log in to add tags.
No physical items for this record

The paper presents an investigation of functional dependencies in morphosyntactic tagging using hidden Markov models. Starting from a well known fact that the HMM tagging paradigm relies on lexical knowledge acquired from training corpora and stored in form of transition and emission matrices, also called a language model, in the experiment, we apply the TnT trigram tagger on creating language models for seven different languages from the MULTEXT East v3 project translations of George Orwell’ s 1984. – Czech, Estonian, Hungarian, Romanian, Serbian, Slovene and original English version. We then use these language models in the tagging procedure and obtain details on various relations between training corpora statistics, training outputs and outputs of the tagging procedure.

Projekt MZOS 036-1300646-1986

Projekt MZOS 130-1300646-0645

Projekt MZOS 130-1300646-1776

ENG

There are no comments for this item.

Log in to your account to post a comment.

Powered by Koha