Normal view MARC view ISBD view

Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search / Merkler, Danijela ; Agić, Željko ; Tadić, Marko.

By: Merkler, Danijela.
Contributor(s): Tadić, Marko [aut] | Agić, Željko [aut].
Material type: ArticleArticleDescription: str.Other title: Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search [Naslov na engleskom:].Subject(s): 5.04 | 6.03 | automatic enrichment, morphological lexicon, large corpora hrv | automatic enrichment, morphological lexicon, large corpora engOnline resources: Click here to access online In: The 8th International Conference on Formal Approaches to South Slavic and Balkan Languages (FASSBL 2012) (19-21.09.2012. ; Dubrovnik, Hrvatska) Proceedings of FASSBL 2012Summary: Inflectional (or morphological) lexica are considered to be language resources of high importance and frequent usage in many language processing tasks -- from basic problems such as lemmatization and morphosyntactic tagging of written text to applications in machine learning, information extraction, information retrieval and machine translation -- for highly inflectional languages such as Croatian. Being that Croatian Morphological Lexicon (HML) is frequently used both as a stand-alone application and as a module in many other systems for processing Croatian, unknown wordforms -- those undetected when matching unseen text with the current version of the HML database -- are constantly being logged and the lexicon is being updated to newer versions by inserting these new wordforms in batches. Accordingly, in this paper, we propose a generic approach to (semi-)automatic generation of new candidate lemmas for HML, their verification, assignment of inflectional patterns and finally creation and insertion of new lexicon entries to HML in a single processing pipeline.
Tags from this library: No tags from this library for this title. Log in to add tags.
No physical items for this record

Inflectional (or morphological) lexica are considered to be language resources of high importance and frequent usage in many language processing tasks -- from basic problems such as lemmatization and morphosyntactic tagging of written text to applications in machine learning, information extraction, information retrieval and machine translation -- for highly inflectional languages such as Croatian. Being that Croatian Morphological Lexicon (HML) is frequently used both as a stand-alone application and as a module in many other systems for processing Croatian, unknown wordforms -- those undetected when matching unseen text with the current version of the HML database -- are constantly being logged and the lexicon is being updated to newer versions by inserting these new wordforms in batches. Accordingly, in this paper, we propose a generic approach to (semi-)automatic generation of new candidate lemmas for HML, their verification, assignment of inflectional patterns and finally creation and insertion of new lexicon entries to HML in a single processing pipeline.

Projekt MZOS 130-1300646-0645

Projekt MZOS 130-1300646-1776

ENG

There are no comments for this item.

Log in to your account to post a comment.

Powered by Koha

//