Normal view MARC view ISBD view

Disambiguating vectors for bilingual lexicon extraction from comparable corpora / Apidianaki, Marianna ; Ljubešić, Nikola ; Fišer, Darja.

By: Apidianaki, Marianna.
Contributor(s): Fišer, Darja [aut] | Ljubešić, Nikola, informatičar [aut].
Material type: ArticleArticleDescription: 10-15 str.Other title: Disambiguating vectors for bilingual lexicon extraction from comparable corpora [Naslov na engleskom:].Subject(s): 5.04 | bilingual lexicon extraction, cross-lingual sense clustering, feature disambiguation hrv | bilingual lexicon extraction, cross-lingual sense clustering, feature disambiguation eng In: Eighth LANGUAGE TECHNOLOGIES Conference (8.-9.10.2012. ; Ljubljana, Slovenija) Proceedings of the Eighth LANGUAGE TECHNOLOGIES Conference str. 10-15Erjavec, Tomaž ; Žganec Gros, JernejaSummary: This paper presents an approach to enhance the extraction of translation equivalents from comparable corpora by plugging in bilingual lexico-semantic knowledge harvested from a parallel corpus. First, the bilingual lexicon obtained from word-aligning the parallel corpus replaces an external seed dictionary, making the approach knowledge-light and portable. Next, instead of using simple 1:1 mappings between the source and the target language, translation equivalents are clustered into sets of synonyms based on contextual similarities, enabling us to expand the translation of vector features with several translation variants. And last but not least, the vector features are disambiguated and translated only with the translation variants from the most appropriate cluster, thus producing less noisy vectors that allow for a more successful cross- lingual comparison of the vectors compared to simpler methods.
Tags from this library: No tags from this library for this title. Log in to add tags.
No physical items for this record

This paper presents an approach to enhance the extraction of translation equivalents from comparable corpora by plugging in bilingual lexico-semantic knowledge harvested from a parallel corpus. First, the bilingual lexicon obtained from word-aligning the parallel corpus replaces an external seed dictionary, making the approach knowledge-light and portable. Next, instead of using simple 1:1 mappings between the source and the target language, translation equivalents are clustered into sets of synonyms based on contextual similarities, enabling us to expand the translation of vector features with several translation variants. And last but not least, the vector features are disambiguated and translated only with the translation variants from the most appropriate cluster, thus producing less noisy vectors that allow for a more successful cross- lingual comparison of the vectors compared to simpler methods.

Projekt MZOS 130-1301679-1380

ENG

There are no comments for this item.

Log in to your account to post a comment.

Powered by Koha

//