Analysis posts of the social network Twitter with the stream processing systems Apache Spark and Apache Storm

N.A. Gorshkov, V.S. Denisiov


The article discusses the comparison streaming processing systems Apache Storm and Apache Spark in the problem analysis the social network Twitter posts. At first, it describes the basic concepts of engines, their settings and launching applications. Then specific problems of tweets analysis are considered, as well as the structure of the cluster on which the performance test was carried out. In conclusion, the findings were made on the applicability of Storm and Spark for the considered problems.

Full Text:

PDF (Russian)


Hesla, “Particle physics tames big data”

Hirak Kashyap, Hasin Afzal Ahmed, “Big Data Analytics in Bioinformatics: A Machine Learning Perspective”

Eric D. Feigelson and G. Jogesh Babu, “Big data in astronomy”

Saeed Shahrivari and Saeed Jalili, “Beyond Batch Processing: Towards Real-Time and Streaming Big Data”

Zeba Khanam and Shafali Agarwal, “Map-Reduce Implementations: Survey and Performance Comparison”

Apache Hadoop

Andrew C.Oliver, “Storm or Spark: Choose your real-time weapon”

Dokumentacija Apache Spark

Dokumentacija Apache Storm

Dokumentacija Apache Kafka

Twitter Streaming API

Apache Flume

Amazon Kinesis Streams

Apache Zookeeper

Dokumentacija AWS EC2

Apache Hadoop YARN

Apache Mesos

Matei Zaharia, Tathagata Das, et al., “Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing”

Sanket Chintapalli, Derek Dagit, Bobby Evans, et al., “Benchmarking Streaming Computation Engines at Yahoo!”

Apache Flink

Ishodnye kody testa proizvoditel'nosti ot Yahoo!

Peter F. Brown, Peter V. deSouza, Robert L. Mercer, et al., “Class-Based n-gram Models of Natural Language”

Alberto Barr´on-Cede˜no and Paolo Rosso, “On Automatic Plagiarism Detection Based on n-Grams Comparison”

William B. Cavnar and John M. Trenkle, “N-Gram-Based Text Categorization”

David Sundby, “Spelling correction using N-grams”

Hosebird Client

Twitter Apps

Ishodnyj kod programmy-prodjusera, otpravljajushhej tvity v Kafka

Ishodnyj kod programm dlja dvizhkov Spark i Storm

Jonathan Leibiusky, Gabriel Eisbruch and Dario Simonassi, “Getting Started with Storm”

Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia, “Learning Spark”


  • There are currently no refbacks.

Abava   IEEE FRUCT 2018

ISSN: 2307-8162