Ljubešić, Nikola, informatičar
Statistical machine translation of croatian weather forecasts: how much data do we need? / Ljubešić, Nikola ; Bago, Petra ; Boras, Damir. - 303-308 str.
This research is the first step towards developing a system for translating Croatian weather forecasts into multiple languages. This step deals with the Croatian-English language pair. The parallel corpus consists of a one-year sample of the weather forecasts for the Adriatic, con- sisting of 7, 893 sentence pairs. Evaluation is performed by the automatic evaluation measures BLUE, NIST and METEOR, as well as by manually evaluating a sample of 200 translations. We have shown that with a small- sized training set and the state-of-the art Moses system, decod- ing can be done with 96% accuracy concerning adequacy and fluency. Additional improvement is expected by increasing the training set size. Finally, the correlation of the recorded evaluation measures is explored.
Boras, Damir ; Bago, Petra ;