Conceptual ambiguity in English texts on terrorism: causes and disambiguation methods

A. Yu. Zinoveva

Abstract


Today’s natural language processing research frequently addresses the issue of content semantization (including the semantization of unstructured texts such as electronic news) by means of semantic annotation or its special case, ontology-based and domain-oriented conceptual annotation. Conceptual annotation is often complicated by conceptual ambiguity manifested in one-to-many mappings between lexical items and ontology concepts. This paper examines the causes of conceptual ambiguity in restricted domain texts, with the case study of English-language electronic news on terror attacks. Four causes of conceptual ambiguity are revealed: part-of speech homonymy, lexical ambiguity, the plurality of conceptual meanings (the most productive), and the extralinguistic context (the least productive, but the hardest to resolve). Three quantitative disambiguation methods are studied: a) tag ranking, b) a bigram-model-based contextual method, and c) a positional method. All the methods are found useful for computer-aided conceptual disambiguation, yet it is pointed out that these quantitative methods are not quite accurate when used alone and rule-based methods would be a good addition.

Full Text:

PDF (Russian)

References


DOI: 10.25559/INJOIT.2307-8162.08.202011.64-72

M. Djemaa, M. Candito, Ph. Muller and L. Vieu, “Corpus annotation within the French FrameNet: a domain-by-domain methodology,” in Proc. 10th International Conference on Language Resources and Evaluation, Portorož, Slovenia, 2016, pp. 3794–3801.

E.V. Rakhilina, B.P. Kobritsov, G.I. Kustova, O.N. Lyashevskaya and O.J. Shemanayeva, “Semantic ambiguity as an application-oriented problem: word class tagging in the RNC,” in Proc. International Workshop Dialogue 2006, Moscow, 2006, pp. 445–450 (in Russian).

M. Palmer, P. Gildea and P. Kingsbury, “The proposition bank: an annotated corpus of semantic roles,” in Computational Linguistics vol. 31(1), 2005, 71–106.

J. Lin, D.N. Semenova, S.L. Pshchin, T.G. Petrov, M.N. Babariko and S.V. Chebanov, “Manual tagging of the corpus for studying of concept statistics”, in Proc. International Scientific Conference Corpus Linguistics 2019, Saint Petersburg, 2019, pp. 248–257 (in Russian).

J.D. Kim, T. Ohta and J. Tsujii, “Corpus annotation for mining biomedical events from literature” in BMC Bioinformatics vol. 9, 2008, pp. 9–10.

M.J. Zagorulko, I.S. Kononenko and E.A. Sidorova, “System for semantic annotation of domain-specific text corpora,” in Proc. International Conference Dialogue 2012, Moscow, 2012, pp. 674–685 (in Russian).

D. Song, C.G. Chute and C. Tao, “Semantator: a semi-automatic semantic annotation tool for clinical narratives,” in Proc. 10th International Semantic Web Conference, Bonn, Germany, 2011, pp. 1–4.

J.S. Viju, “Concept interpretation by semantic knowledge harvesting,” in International Journal for Research in Applied Science & Engineering Technology (IJRASET) vol. 6(5), 2018, pp. 477–484.

A.N. El-Ghobashy, G.M. Attiya and H.M. Kelash, “SAAT: a manual annotation tool for the Arabic content authoring,” in International Journal of Computing and Digital Systems vol. 4(4), 2015, pp. 1–6.

S. Sheremetyeva and A. Zinoveva, “Ontological analysis of e-news: a case for terrorism domain,” in Proc. 14th International Conference on Interactive Systems: Problems of Human-Computer Interaction, Ulyanovsk, 2019, pp. 130–141.

A.Yu. Sirotina, N.V. Loukachevich, “Towards construction of an annotated corpus in cybersecurity,” in Proc. International Scientific Conference Corpus Linguistics 2019, Saint Petersburg, 2019, pp. 79–85 (in Russian).

S. Sheremetyeva, “Towards creating interoperable resources for conceptual annotation of multilingual domain corpora,” in Proc. 16th Joint ACL-ISO Workshop Interoperable Semantic Annotation (ISA-16), Marseille, 2020, pp. 102–109.

S. Nirenburg and V. Raskin, Ontological semantics. Cambridge: MIT Press, 2004.

A. Preda, “Lexical ambiguity revisited: on homonymy and polysemy,” in Proc. International Conference Literature, Discourse and Multicultural Dialogue. Section: Language and Discourse, Târgu Mureș, Romania, 2013, pp. 1047–1054.

H.P. Edmundson, “New methods in automatic extracting,” in Journal of the Association for Computing Machinery vol. 16(2), 1969, pp. 264–285.

T.I. DeAngelo and N.S. Yegiyan, “Looking for efficiency: How online news structure and emotional tone influence processing time and memory,” in Journalism & Mass Communication Quarterly vol. 96(2), 2019, pp. 385–405.


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность MoNeTec 2024

ISSN: 2307-8162