Normal view MARC view ISBD view

Towards Obtaining High Quality Sentence-Aligned English-Croatian Parallel Corpus / Brkić, Marija ; Matetić, Maja ; Seljan, Sanja.

By: Brkić, Marija.
Contributor(s): Matetić, Maja [aut] | Seljan, Sanja [aut].
Material type: ArticleArticleDescription: 1068-1070 str.Other title: Towards Obtaining High Quality Sentence-Aligned English-Croatian Parallel Corpus [Naslov na engleskom:].Subject(s): 5.04 | Sentence alignment ; alignment tools ; sentence alignment evaluation ; parallel corpus ; sentence-length ; word-correspondence hrv | Sentence alignment ; alignment tools ; sentence alignment evaluation ; parallel corpus ; sentence-length ; word-correspondence eng In: 4th IEEE International Conference on Computer Science and Information Technology ICCSIT 2011 (10-12.06.2011. ; Sečuan, Kina) Proceedings of the 4th IEEE International Conference on Computer Science and Information Technology ICCSIT 2011 str. 1068-1070Summary: This paper presents the acquisition of parallel bilingual corpus and all the steps involved in the process of unsupervised sentence alignment, such as tokenization, lowercasing, etc. The problem of sentence alignment is not trivial because translators do not necessarily translate one sentence in the source language into one sentence in the target language. Three different unsupervised and language independent approaches to sentence alignment are presented and implementations of these approaches through three different freely available tools are tested. A gold standard for English-Croatian automatic sentence alignment evaluation is created. Finally, a detailed analysis of the acquired corpus is given.
Tags from this library: No tags from this library for this title. Log in to add tags.
No physical items for this record

This paper presents the acquisition of parallel bilingual corpus and all the steps involved in the process of unsupervised sentence alignment, such as tokenization, lowercasing, etc. The problem of sentence alignment is not trivial because translators do not necessarily translate one sentence in the source language into one sentence in the target language. Three different unsupervised and language independent approaches to sentence alignment are presented and implementations of these approaches through three different freely available tools are tested. A gold standard for English-Croatian automatic sentence alignment evaluation is created. Finally, a detailed analysis of the acquired corpus is given.

Projekt MZOS 130-1300646-0909

ENG

There are no comments for this item.

Log in to your account to post a comment.

Powered by Koha

//