Frequency analysis of text using a computer

Robert Mayer


The frequency analysis method of the text is considered, which leads to obtaining spectral distributions of letters, words, and semantic segments. The purpose of the article: 1) to create computer programs that allow you to obtain the spectra of the distribution of words and individual characters in large texts; 2) to test them in the analysis of V.G. Korolenko's novel "Children of the Dungeon"; 3) to build a probabilistic model of the writer. There are three programs in the ABCPascal language that allow to get: 1) the frequency distribution of letters and their combinations; 2) the spectral distribution of words and semantic segments; 3) the number of transitions from semantic segments of length n to semantic segments of length m. The article provides: 1) the spectral distributions of the vowels "o", "a", "e", "i", "u", "ya", "yu" in the analyzed text; 2) the frequency distribution of words along the length; 3) the spectrum of semantic segments of the text limited by punctuation marks; 4) the matrix of transitions from semantic segments of length n to semantic segments of length m; 5) the probabilities table of these transitions; 6) the graph of probabilistic automaton simulating the generation of text by the author. Its vertices correspond to the number of words in semantic text segments separated by punctuation marks, and the edges correspond to the most likely transitions. All this characterizes the individual characteristics of the style and can be used to establish authorship.

Full Text:

PDF (Russian)


