Normal view MARC view ISBD view

Corpus-Based Comparison of Contemporary Croatian, Serbian and Bosnian / Bekavac, Božo ; Seljan, Sanja ; Simeon, Ivana.

By: Bekavac, Božo.
Contributor(s): Seljan, Sanja [aut] | Simeon, Ivana [aut].
Material type: materialTypeLabelArticleDescription: 33-39.Other title: Corpus-Based Comparison of Contemporary Croatian, Serbian and Bosnian [Naslov na engleskom:].Subject(s): 5.04 | 6.03 | slavenski jezici, hrvatski, srpski, bosanski, jezične razlike hrv | Slavic languages, Croatian, Serbian, Bosnian, language differences engOnline resources: Click here to access online In: Formal Approaches to South Slavic and Balkan Languages FASSBL (25-28.09.2008. ; Dubrovnik, Hrvatska) Proceedings of the 6th International Conference on Formal Approaches to South Slavic and Balkan Languages str. 33-39Tadić, Marko ; Dimitrova-Vulchanova, Mila ; Koeva, SvetlaSummary: This paper explores the differences between three Slavic languages: Bosnian, Croatian and Serbian, drawing on the Southeast European Times newspaper corpus, translated to each language from the source English text and consisting of approximately 330, 000 tokens for each language. The paper is an effort intended to contribute to the establishment of the criteria and methodology for measuring similarities between these languages. The differences were explored at five levels: at the level of phonology, morphology, lexis, syntax and semantics. Empirical analysis has shown that a huge portion of differences across the three languages are systematic and regular, and as such, could be formalized for automatic translation/generation. The results of this study and of similar future corpus-based studies can be used in developing NLP tools such as annotating tools, e-dictionaries, text summarizers, machine translation systems, computerassisted language learning etc. for the three languages, as well as further linguistic investigation of their mutual relationship.
Tags from this library: No tags from this library for this title. Log in to add tags.
No physical items for this record

This paper explores the differences between three Slavic languages: Bosnian, Croatian and Serbian, drawing on the Southeast European Times newspaper corpus, translated to each language from the source English text and consisting of approximately 330, 000 tokens for each language. The paper is an effort intended to contribute to the establishment of the criteria and methodology for measuring similarities between these languages. The differences were explored at five levels: at the level of phonology, morphology, lexis, syntax and semantics. Empirical analysis has shown that a huge portion of differences across the three languages are systematic and regular, and as such, could be formalized for automatic translation/generation. The results of this study and of similar future corpus-based studies can be used in developing NLP tools such as annotating tools, e-dictionaries, text summarizers, machine translation systems, computerassisted language learning etc. for the three languages, as well as further linguistic investigation of their mutual relationship.

Projekt MZOS 130-1300646-0645

Projekt MZOS 130-1300646-0909

Projekt MZOS 130-1300646-1002

ENG

There are no comments for this item.

Log in to your account to post a comment.

Powered by Koha