Normal view MARC view ISBD view

The SETimes.HR Linguistically Annotated Corpus of Croatian / Agić, Željko ; Ljubešić, Nikola.

By: Agić, Željko.
Contributor(s): Ljubešić, Nikola, informatičar [aut].
Material type: ArticleArticleDescription: 1724-1727 str.Other title: The SETimes.HR Linguistically Annotated Corpus of Croatian [Naslov na engleskom:].Subject(s): 5.04 | dependency treebank, Croatian language, free availability hrv | dependency treebank, Croatian language, free availability engOnline resources: Click here to access online | Click here to access online In: International Conference on Language Resources and Evaluation, LREC ( 9 ; 2014 ; Reykjavik, Island) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014) str. 1724-1727Calzolari, Nicoletta ; Choukri, Khalid ; Declerck, Thierry ; Loftsson, Hrafn ; Maegaard, Bente ; Mariani, Joseph ; Moreno, Asuncion ; Odijk, Jan ; Piperidis, SteliosSummary: We present SETIMES.HR— the first linguistically annotated corpus of Croatian that is freely available for all purposes. The corpus is built on top of the SETIMES parallel corpus of nine Southeast European languages and English. It is manually annotated for lemmas, morphosyntactic tags, named entities and dependency syntax. We couple the corpus with domain-sensitive test sets for Croatian and Serbian to support direct model transfer evaluation between these closely related languages. We build and evaluate statistical models for lemmatization, morphosyntactic tagging, named entity recognition and dependency parsing on top of SETIMES.HR and the test sets, providing the state of the art in all the tasks. We make all resources presented in the paper freely available under a very permissive licensing scheme.
Tags from this library: No tags from this library for this title. Log in to add tags.
No physical items for this record

We present SETIMES.HR— the first linguistically annotated corpus of Croatian that is freely available for all purposes. The corpus is built on top of the SETIMES parallel corpus of nine Southeast European languages and English. It is manually annotated for lemmas, morphosyntactic tags, named entities and dependency syntax. We couple the corpus with domain-sensitive test sets for Croatian and Serbian to support direct model transfer evaluation between these closely related languages. We build and evaluate statistical models for lemmatization, morphosyntactic tagging, named entity recognition and dependency parsing on top of SETIMES.HR and the test sets, providing the state of the art in all the tasks. We make all resources presented in the paper freely available under a very permissive licensing scheme.

Projekt MZOS 130-1300646-1776

ENG

There are no comments for this item.

Log in to your account to post a comment.

Powered by Koha

//