Normal view MARC view ISBD view

Vector Disambiguation for Translation Extraction from Comparable Corpora / Marianna Apidianaki ; Nikola Ljubešić ; Darja Fišer.

By: Apidianaki, Marianna.
Contributor(s): Ljubešić, Nikola, informatičar [aut] | Fišer, Darja [aut].
Material type: ArticleArticlePublisher: 2013Description: 193-201 str.Other title: Vector Disambiguation for Translation Extraction from Comparable Corpora [Naslov na engleskom:].Subject(s): 5.04 | word sense disambiguation; sense clustering; comparable corpora | word sense disambiguation; sense clustering; comparable corporaOnline resources: Elektronička verzija In: Informatica (Ljubljana) 37 (2013), 2 ; str. 193-201Summary: We present a new data-driven approach for enhancing the extraction of translation equivalents from comparable corpora which exploits bilingual lexico-semantic knowledge harvested from a parallel corpus. First, the bilingual lexicon obtained from word-aligning the parallel corpus replaces an external seed dictionary, making the approach knowledge-light and portable. Next, instead of using simple one-to-one mappings between the source and the target language, translation equivalents are clustered into sets of synonyms by a cross-lingual Word Sense Induction method. The obtained sense clusters enable us to expand the translation of vector features with several translation variants using a cross-lingual Word Sense Disambiguation method. Consequently, the vector features are disambiguated and translated with the translation variants included in the semantically most appropriate cluster, thus producing less noisy and richer vectors that allow for a more successful cross-lingual vector comparison than in previous methods.
Tags from this library: No tags from this library for this title. Log in to add tags.
No physical items for this record

We present a new data-driven approach for enhancing the extraction of translation equivalents from comparable corpora which exploits bilingual lexico-semantic knowledge harvested from a parallel corpus. First, the bilingual lexicon obtained from word-aligning the parallel corpus replaces an external seed dictionary, making the approach knowledge-light and portable. Next, instead of using simple one-to-one mappings between the source and the target language, translation equivalents are clustered into sets of synonyms by a cross-lingual Word Sense Induction method. The obtained sense clusters enable us to expand the translation of vector features with several translation variants using a cross-lingual Word Sense Disambiguation method. Consequently, the vector features are disambiguated and translated with the translation variants included in the semantically most appropriate cluster, thus producing less noisy and richer vectors that allow for a more successful cross-lingual vector comparison than in previous methods.

Projekt MZOS projekt

ENG

There are no comments for this item.

Log in to your account to post a comment.

Powered by Koha

//