Towards a part-of-speech tagger for Sranan Tongo
Abstract
This paper is the continuation of a work submitted to the International Conference Corpus Linguistics 2021 [1]. On that occasion, a rule-based stochastic hybrid part-of-speech tagger (POS) was introduced for Sranan Tongo, a Creole language from South America with around half a million speakers. Since Sranan Tongo does not have a written corpus and text annotation is an expensive and time-consuming task, it was proposed to take a first step in training a POS tagger using only 550 hand-annotated sentences with part of speech tags.
In this new contribution, the development of the POS tagger for Sranan Tongo goes a step further with the addition of more training data. For this matter, the tagger was used to annotate 2,406 sentences. The tagging results were hand-corrected and employed to retrain the model. A comparison is shown between the performance of the POS tagger on three texts before and after the inclusion of the new training data.
Full Text:
PDFReferences
DOI: 10.25559/INJOIT.2307-8162.09.202112.99-103
Cortegoso Vissio N., Zakharov V. A rule-stochastic hybrid POS-tagger for Sranan Tongo with minimal lexicon and training dataset. In: Proceedings of the International Conference «Corpus Linguistics-2021». Saint-Petersburg, Sofia-Press. 2021 (in print).
Radke H. Niederländisch und Sranantongo in Surinamischer Onlinekommunikation // Taal en Tongval. University Press, Amsterdam, 2017. Vol. 69. P. 113-136.
Sebba M. Contact languages: pidgins and creoles. Palgrave Macmillan, 1997.
Wortubuku fu Sranan Tongo. SIL International. URL: https://www.sil.org/resources/archives/13426 (accessed: 10.10.2021).
Yakpo K., Bruyn A. Transatlantic patterns: The relexification of locative constructions in Sranan // Surviving the Middle Passage: The West Africa-Surinam Sprachbund / Pieter Muysken, Norval Smith (Eds.). De Gruyter Mouton, Berlin, 2015. P. 135–175.
Wilner J. Wortubuku fu Sranan Tongo. Sranan Tongo-English Dictionary / John Wilner (ed.), Ronald Pinas, Lucien Donk, Hertoch Linger Arnie Lo-Ning-Hing, Tieneke MacBean, Celita Zebeda-Bendt, Chiquita Pawironadi-Nunez, Dorothy Wong Loi Sing. SIL International, 2007.5th ed.
Wilner J. Wortubuku fu Sranan Tongo. Sranan Tongo-Nederlands Woordenboek / John Wilner (ed.), Ronald Pinas, Lucien Donk, Hertoch Linger Arnie Lo-Ning-Hing, Tieneke MacBean, Celita Zebeda-Bendt, Chiquita Pawironadi-Nunez, Dorothy Wong Loi Sing. SIL International, 2007. 5th ed.
Nickel M., Wilner J. Papers on Sranan Tongo. Summer Institute of Linguistics, 1984. URL: https://archive.org/details/rosettaproject_srn_morsyn-1 (accessed: 05.04.2021).
Winford D., Plag I. Sranan structure dataset // Atlas of Pidgin and Creole Language Structures Online / Susanne Maria Michaelis, Philippe Maurer, Martin Haspelmath, Huber Magnus (Eds.). Leipzig: Max Planck Institute for Evolutionary Anthropology, 2013. URL: http://apics-online.info/contributions/2 (accessed: 03.10.2021).
Jurafsky D. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition / Daniel Jurafsky, James H. Martin. Prentice Hall, New Jersey, 2008. 2nd edition.
Rijksoverheid. Skowtu hori yu na ini a tori fu wan ordru fu den bakrakondre, nanga den tyari yu na skowt’oso noso wan tra presi pe den o yere yu. URL: https://www.rijksoverheid.nl/documenten/brochures/2014/07/01/u-bent-aangehouden-in-verband-met-een-europees-aanhoudingsbevel-en-meegenomen-naar-het-politiebureau-of-andere-verhoorlocatie.-wat-zijn-uw-rechten-sranan-tongo (accessed: 10.10.2021).
MacBean G. A gridi frow fu fisman Albert. Institut voor Taalwetenschap (SIL). 1993. URL: http://suriname-languages.sil.org/Sranan/English/SrananEngLLIndex.html (accessed: 10.10.2021).
Pinas E. San pesa ini Kaneri. Nieuwe Surinaamse Verhalen. Nieuwe Surinaamse verhalen. M. van Kempen (comp.). Uitgeverij De Volksboekwinkel, Paramaribo. 1986.
Cortegoso Vissio N. A part of speech tagger for Sranan Tongo based on a Trigram Hidden Markov Model // GitHub repository. URL: https://github.com/nicolascortegoso/HMM-for-sranantongo (accessed: 10.10.2021).
Refbacks
- There are currently no refbacks.
Abava Кибербезопасность IT Congress 2024
ISSN: 2307-8162