Russian-Tatar Socio-political Thesaurus: Publishing in the Linguistic Linked Open Data Cloud

A. Galieva, A. Kirillivich, N. Loukachevitch, O. Nevzorova, D. Suleymanov, D. Yakubova

Abstract


The paper discusses the main principles and practical aspects of implementing a new bilingual lexical resource - the Russian-Tatar socio-political thesaurus and its publishing in the Linguistic Linked Open Data Cloud (LLOD). This thesaurus is developed on the basis of the Russian RuThes thesaurus format which is built as a hierarchy of concepts. Each concept has a unique name and a set of language expressions that refer to it in texts. The authors discuss the general methodology of translating concept names and their text entries, as well as ways of reflecting the specificity of the Tatar lexical-semantic system. The models and ontologies of the Linguistic Linked Open Data Cloud are considered. The paper gives the description of the constructed resource as a set of linked data in the LLOD format.

Full Text:

PDF (Russian)

References


Loukachevitch, N. and Dobrov B. 2015. The Sociopolitical Thesaurus as a resource for automatic document processing in Russian. In Terminology. Vol. 21, 2, 237-262.

Chiarcos, C., McCrae, J., Cimiano, P., Fellbaum, C.: Towards open data for linguistics: Linguistic Linked Data. In: Oltramari, A., Vossen, P., Qin, L., Hovy, E. (eds.) New Trends of Research in Ontologies and Lexical Resources. Theory and Applications of Natural Language Processing, pp. 7–25. Springer, Heidelberg (2013). doi:10.1007/978-3-642- 31782-8_2

McCrae, J.P., et al.: The open linguistics working group: developing the Linguistic Linked Open Data cloud. In: Calzolari, N., et al. (eds.) Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pp. 2435–2441 (2016)

GOST 7.24-2007 Mezhgosudarstvennyj standart. Tezaurus informacionno-poiskovyj, mnogojazychnyj. – M.: Standartinform, 2007.

ISO 25964-1:2011. Information and documentation – Thesauri and interoperability with other vocabularies – Part 1: Thesauri for information retrieval. Available: http://www.iso.org/iso/catalogue_detail.htm?csnumber=53657

ISO 5964-1985. Documentation – Guidelines for the establishment and development of multilingual thesauri. Available: http://www.iso.org/iso/catalogue_detail.htm?csnumber=12159

United Nations. UNBIS Thesaurus. English version. 1986. New York, Dag Hammarskjold Library of the United Nations.

INIS Thesaurus. English version. IAEA-INIS Reference Series. 2016.

Miller, G.A. 1995. WordNet: A lexical database for English. In Communications of the ACM. Vol. 38, 11, 39-41.

MultiWordNet. 2004. Available: http://multiwordnet.fbk.eu/english/home.php

WordNet: An Electronic Lexical Database. 1998. ed. Fellbaum, C. Cambridge, MIT Press.

Loukachevitch, N. and Dobrov B. 2015. The Sociopolitical Thesaurus as a resource for automatic document processing in Russian. In Terminology. Vol. 21, 2, 237-262.

van Assem, M., Gangemi, A., Schreiber, G.: Conversion of WordNet to a standard RDF/OWL representation. In: Calzolari, N., et al. (eds.) Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), pp. 237–242 (2006).

Eckle-Kohler, J., McCrae, J.P., Chiarcos, C.: lemonUby - a large, interlinked, syntactically-rich lexical resource for ontologies. Semant. Web 6(4), 371–378 (2015). doi:10.3233/SW-140159

McCrae, J.P., Fellbaum, C., Cimiano, P.: Publishing and linking WordNet using Lemon and RDF. In: Chiarcos, C., et al. (eds.) Proceedings of the 3rd Workshop on Linked Data in Linguistics (LDL-2014) (2014).

Sérasset, G.: DBnary: Wiktionary as a Lemon-based multilingual lexical resource in RDF. Semant. Web 6(4), 355–361 (2015). doi:10.3233/SW-140147

Paredes, L.P., Aĺvarez Rodriǵuez, J.M., Azcona, E.R.: Promoting government controlled vocabularies for the Semantic Web: the EUROVOC thesaurus and the CPV product classification system. In: Kollias, S., Cousins, J. (eds.) Proceedings of the 1st International Workshop on Semantic Interoperability in the European Digital Library (SIEDL 2008), pp. 111–122 (2008).

Caracciolo, C., Stellato, A.: Thesaurus maintenance, alignment and publication as Linked Data: the AGROVOC use case. Int. J. Metadata Semant. Ontol. 7(1), 65–75 (2012). doi:10. 1504/IJMSO.2012.048511

Caracciolo, C., Stellato, A., Morshed, A., Johannsen, G., Rajbhandari, S., Jaques, Y., Keizer, J.: The AGROVOC linked dataset. Semant. Web 4(3), 341–348 (2013). doi:10.3233/SW- 130106

Zapilko, B., Schaible, J., Mayr, P., Mathiak, B.: TheSoz: a SKOS representation of the thesaurus for the social sciences. Semant. Web 4(3), 257–263 (2013). doi:10.3233/SW- 2012-0081

Summers, E., Isaac, A., Redding, C., Krech, D.: LCSH, SKOS and Linked Data. In: Greenberg, J., Klas, W. (eds.) Proceedings of the 2008 International Conference on Dublin Core and Metadata Applications (DC 2008), pp. 25–33 (2008).

Ustalov, D.: Russian thesauri as Linked Open Data. In: Computational Linguistics and Intellectual Technologies: papers from the Annual conference “Dialogue”, vol. 1, pp. 616– 625. RGGU (2015).

Nevzorova, O., Zhiltsov, N., Kirillovich, A., Lipachev, E.: OntoMathPro ontology: a Linked Data hub for mathematics. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2014. CCIS, vol 468, pp. 105–119. Springer, Cham (2014). doi:10.1007/978-3-319-11716-4_9

Elizarov, A.M., Kirillovich, A.V., Lipachev, E.K., Nevzorova, O.A., Solovyev, V.D., Zhiltsov, N.G.: Mathematical knowledge representation: semantic models and formalisms. Lobachevskii J. Math. 35(4), 348–354 (2014). doi:10.1134/S1995080214040143

Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012). doi:10.1016/j.artint.2012.07.001

Ehrmann, M., Cecconi, F., Vannella, D., McCrae, J., Cimiano, P., Navigli, R.: Representing multilingual data as Linked Data: the case of BabelNet 2.0. In: Calzolari, N., et al. (eds.) Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), pp. 401–408 (2014).

Galieva A.M. Sozdanie russko-tatarskogo tezaurusa po obshhestvenno-politicheskoj tematike: obshhie principy i aspekty realizacii / A.M. Galieva, A.V. Kirillovich, N.V. Lukashevich, O.A. Nevzorova, D.Sh. Sulejmanov // Nauchno-tehnicheskaja informacija. Serija 2: Informacionnye processy i sistemy. – 2017. – #2. – S. 20-28.

Amirov F.K. Russko-tatarskij juridicheskij slovar'. – Kazan': 1996.

Nizamov I.M. Kratkij russko-tatarskij obshhestvenno-politicheskij slovar'. – Kazan': 1995.

Nizamov I.M. Russko-tatarskij obshhestvenno-politicheskij slovar'. – Kazan': 1997.

Antoine Isaac, Ed Summers. SKOS Simple Knowledge Organization System Primer. W3C Working Group Note 18 August 2009. Available: https://www.w3.org/TR/skos-primer/

Baker, T., et al.: Key choices in the design of Simple Knowledge Organization System (SKOS). J. Web Semant. 20, 35–49 (2013). doi:10.1016/j.websem.2013.05.001

McCrae, J., Spohr, D., Cimiano, P.: Linking lexical resources and ontologies on the Semantic Web with Lemon. In: Antoniou, G., et al. (eds.) ESWC 2011. Part I, LNCS, vol. 6643, pp. 245–259. Springer, Heidelberg (2011). doi:10.1007/978-3-642-21034-1_17

McCrae, J., et al.: The Lemon cookbook. Available: http://lemon-model.net/lemon-cookbook.pdf

Cimiano, P., McCrae, J.P., Buitelaar, P.: Lexicon model for ontologies. Final community group report, 10 May 2016. Available: https://www.w3.org/2016/05/ontolex/

ISO 24613:2008: Language resource management - Lexical markup framework (LMF).

Kemps-Snijders, M., Windhouwer, M., Wittenburg, P., Wright, S.E.: ISOcat: remodelling metadata for language resources. Int. J. Metadata Semant. Ontol. 4(4), 261–276 (2009). doi:10.1504/IJMSO.2009.029230

Kemps-Snijders, M., Windhouwer, M., Wittenburg, P., Wright, S.E.: ISOcat: corralling data categories in the wild. In: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008), pp. 887–891 (2008).

Windhouwer, M., Wright, S.E.: Linking to linguistic data categories in ISOcat. In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics, pp. 99–107. Springer, Heidelberg (2012). doi:10.1007/978-3-642-28249-2_10

ISO 12620:2009: Terminology and other language and content resources—Specification of data categories and management of a Data Category Registry for language resources.

LexInfo. Available: http://www.lexinfo.net/

Chiarcos, C.: OLiA – Ontologies of Linguistic Annotation. Semant. Web 6(4), 379–386 (2015). doi:10.3233/SW-140167

Chiarcos, C.: Ontologies of linguistic annotation: survey and perspectives. In: Calzolari, N., et al. (eds.) Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), pp. 303–310 (2012).

Hellmann, S., Lehmann, J., Auer, S., Brümmer, M.: Integrating NLP using Linked Data. In: Alani, H., et al. (eds.) ISWC 2013, Part II. LNCS, vol 8219, pp. 98–113. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41338-4_7

Sanderson, R., Ciccarese, P., Young, B.: Web annotation data model. W3C Recommendation, 23 February 2017. Available: https://www.w3.org/TR/annotation-model/

Nevzorova, O., Nevzorov, V.: The Development Support System “OntoIntegrator” for Linguistic Applications. Information Science and Computing, vol. 13, Intelligent Information and Engineering Systems, vol. 3, pp. 78–84. ITHEA, Rzeszow-Sofia (2009).

Loukachevitch, N., Dobrov, B., Chetviorkin, I.: RuThes-Lite, a publicly available version of thesaurus of Russian language RuThes. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, pp. 340–349. RGGU (2014).

Loukachevitch, N., Dobrov, B.: Development of ontologies with minimal set of conceptual relations. In: Lino, M.T., et al. (eds.) Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), pp. 1889–1892 (2004).

Gil, Y., Miles, S.: PROV Model Primer. W3C Working Group Note, 30 April 2013. Available: https://www.w3.org/TR/prov-primer/

Guarino, N., Welty, C.A.: A Formal ontology of properties. In: Dieng, R., Corby, O. (eds.) EKAW 2000. LNCS, vol. 1937, pp. 97–112. Springer, Heidelberg (2000). doi:10.1007/3- 540-39967-4_8.


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность MoNeTec 2024

ISSN: 2307-8162