Content analysis of big qualitative data

Anton Oleinik


When working with big data in science (research databanks, literature reviews) and everyday life (news aggregators), there is a need for mining, classifying and storing information. Information is defined as data in a processed form. The methodology of content analysis in its various forms, qualitative (manual coding), quantitative (words frequencies and co-occurrences) and mixed methods (creation of ad hoc dictionaries based on substitution), offers a tool to address this issue. Interest in content analysis emerged as early as in the 1970s, yet it remains relatively unknown outside of sociology, linguistics and communication studies. Content analysis allows converting qualitative data (texts, images) into digital format (vectors and matrices) and subsequent manipulating digital information using linear algebra, multidimensional scaling and other tools from natural sciences. The conversion into digital formal also paves the way to machine learning. Supervised machine learning looks particularly promising since it implies keeping focus on interpretation of data proper to interpretative sociology. Supervised machine learning is compatible with mixed methods content analysis. The existing program for computer-assisted content analysis (QDA Miner, Atlas TI, NVivo etc.) have several limitations. Restrictions on the number of their users (coders) refer to one of the limitations. The creation of on-line platforms for content analysis allows bypassing this and some other limitations. The idea of creating an on-line databank for qualitative data and a platform for content analyzing it is discussed. In contrast to quantitative data, qualitative research data is rarely available for secondary analysis.

Full Text:

PDF (Russian)


Averyanov L.Ya. Content analysis: a text. Мoscow: Knorus, 2009.

Alexeev A.N. Content analysis in sociology and in relationship with other disciplines. In Problems in content analysis in sociology, Alexeev A.N. (ed.) Novosibirsk, 1970, P. 11-18.

Alexeev A.N. Content analysis: a technique or a method? In Methodological and methodical issues in content analysis. Issue 1, Zdravomysov A.G. (ed.) Moscow-Leningrad., 1973, P. 19-28.

Almayev N.A. Applications of content analysis to psychology of individuality. Moscow, 2012.

Burrows R., Savage M. After the crisis? Big data и methodological challenges in empirical sociology. Socis, 3, 2016.

Brydny A.A. On psychology of understanding a text. In Methodological and methodical issues in content analysis. Issue 1, Zdravomysov A.G. (ed.) Moscow-Leningrad, 1973. P. 92-92.

Djumailo E.S., Baranyuk V.V. A method for connecting ontologically objects in automated systems using classifiers. International Journal of Open Information Technologies. Vol. 6, no.6, 2018.

United archive of economic and survey data., accessed on 13.05.2019.

Kyiv archive: a national bank of survey data., accessed on 13.05.2019.

Klemenkov P.A., Kuznetsov S.D. Big data: contemporary approaches to mining and storage. Proceedings of the Institute of System programming. Vol. 23, 2012.

Kononova O.V., Lyapin S.Kh., Prokudin D.E. A study of terms used in ‘digital economy’ with the help of content analysis. International Journal of Open Information Technologies. Vol. 6, no.12, 2018.

Makhotina N.V. Using content analysis to study a large dataset. In Proceedings of the GPNTB SO RAN. Issue 10. Novosibirsk, 2016. P. 379-385.

Namiot D.E., Kupriyanovsky V.P., Nikolaev D.E., Zubareva E.V. Standards related to big data. International Journal of Open Information Technologies. Vol. 4, no. 11, 2016.

Nikiporets-Takigawa G. Aggression in the language of mass media: a statistical analysis. In Language, consciousness, communication, edited by V.V. Krasnykh, A.I. Izotov. Issue 33. Мoscow: Maks, 2006. P. 56-64.

Oleinik A.N. Triangulation in content analysis: issues in methodology and empirical tests. Socis, №2, 2009.

Ponkin I.V., Redkina A.I. Digital formalization of the law. International Journal of Open Information Technologies. Vol. 7, no.1, 2019.

Samkov L.M. The issue of context in theoretical informatics. In Problems in content analysis in sociology, Alexeev A.N. (ed.) Novosibirsk, 1970. P. 89-93.

Sekerin V.P. Content analysis as a tool to a complex study of a newspaper. In: Methodological and methodical issues in content analysis. Issue 2, Zdravomysov A.G. (ed.) Moscow-Leningrad, 1973. P. 58-62.

Tarshis E.Ya. Content analysis: methodological issues. Мoscow: URSS, Librocom, 2013.

Frolov A.A., Silnov A.A., Sadretdinov A.M. Tools for detecting prohibited content in the internet. International Journal of Open Information Technologies. Vol. 7, no.1, 2019.

Chernyak L.S. Big data: a new theory and practice. Open systems. SUBD. 10, 2011.

Chekharin E.E. Big data: key issues. Prospects of science and education. №3(21), 2016.

Shalak V.I. Content analysis and its applications. Мoscow: Оmega-L, 2004.

Khramova N.N. Particularities of news generation via RSS using Yandex.News as an example. Symbol: the field of media-education. №3(17), 2015.

Berman J.J. Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information. Waltham, MA: Morgan Kaufmann, 2013.

Beierle J.M., Witkowski S. HRAF Coding Reliability. Cross-Cultural Research. Vol. 9, no. 1, 1974.

eHarf World Cultures., accessed 13.05.2019.

Harwood N. An interview-based study of the functions of citations in academic writing across two disciplines. Journal of Pragmatics. Vol. 41, no. 3, 2009.

Hesse B.W., Hesse Moser R.P., Riley W.T. From Big Data to Knowledge in the Social Sciences. The ANNALS of the American Academy of Political and Social Science. Vol. 659, no. 1, 2015.

Jurafsky D., Martin J.H. Speech and Language Processing. 2nd ed. Upper Saddle River, NJ: Pearson-Prentice Hall, 2008.

Krippendorff K. Content Analysis: An Introduction to Its Methodology. Thousand Oaks, CA: SAGE, 2004. 2nd edition.

Lewis S.C., Zamith R., Hermida A. Content Analysis in an Era of Big Data: A Hybrid Approach to Computational and Manual Methods. Journal of Broadcasting & Electronic Media. Vol. 57, no. 1, 2013.

Mannens E., Coppens S., de Pessemier T., Dacquin H., Van Deursen H., De Sutter R., Van de Walle R. Automatic news recommendations via aggregated profiling. Multimedia Tools and Applications. Vol. 63, no 2, 2013.

Nikiporets-Takigawa G. ‘Socio-Political Insider’ system: promises and limitations for the political analysis and prognosis. PolitBook, №1, 2018.

Oleinik A.N. What are neural networks not good at? On artificial creativity. Big Data & Society. On-line first

Oleinik A.N., Popova I., Kirdina S., Shatalova T. On the choice of measures of reliability and validity in the content-analysis of texts. Quality and Quantity. Vol. 48, no. 5, 2014.

Oleinik A.N., Kirdina-Chandler S., Popova I., Shatalova T. On academic reading: citation patterns and beyond. Scientometrics. Vol. 113, no. 1, 2017.

Vogt W.P. Quantitative Research Methods for Professionals. Boston, MA: Pearson, 2007.

Wiedemann G. Opening up to Big Data: Computer-Assisted Analysis of Textual Data in Social Sciences. Forum Qualitative Sozialforschung / Forum: Qualitative Social Research. Vol. 14, no. 2, 2013.

Web of Science. Web of Science Core Collection., accessed 14.05.2019.

Weber M. Economy and Society: An Outline of Interpretative Sociology. Edited by Roth G., Wittich С. New York: Bedminster Press, 1968.

World Bank. World Bank Open Data., accessed 13.05.2019.

Yang Q. A novel recommendation system based on semantics and context awareness. Computing. Vol. 100, no. 8, 2018.


  • There are currently no refbacks.

Abava  Absolutech IT-EDU 2019

ISSN: 2307-8162