Machine learning model serving system for event streams
Abstract
Full Text:
PDF (Russian)References
Shahrivari S, Jalili S. Beyond batch processing: towards real-time and streaming big data. Computers. 2014; 3(4):117–29.
Thones, J. Microservices. IEEE Software 32, 1, 2015, 116 – 116.
T. Akidau, R. Bradshaw, C. Chambers, S. Chernyak, R. J. Fernandez-Moctezuma, R. Lax, S. McVeety, D. Mills, ́ F. Perry, E. Schmidt, et al. The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. PVLDB, 2015.
Iqbal, M.H., Soomro, T.R. Big data analysis: Apache storm perspective. Int. J. Comput. Trends Technol, 2015; 9–14.
Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, et al. “Apache Spark: A Unified Engine for Big Data Processing”. In: Commun. ACM 59.11, 2016, pp. 56–65.
Nair, L. R., Shetty, S. D., & Shetty, S. D. Applying spark based machine learning model on streaming big data for health status prediction. Computers & Electrical Engineering, 2017
M. Kiran, P. Murphy, I. Monga, J. Dugan, and S. S. Baveja. Lambda architecture for cost-effective batch and speed big data processing. In IEEE Intl Conf. on Big Data, pages 2785– 2792. IEEE, 2015. 14
P. Carbone, S. Ewen, S. Haridi, A. Katsifodimos, V. Markl, and K. Tzoumas. Apache flink: Stream and batch processing in a single engine. IEEE Data Engineering Bulletin, page 28, 2015.
Bejeck, William P, Kafka Streams in action: real-time apps and microservices with the Kafka Streams APIShelter Island, NY: Manning Publications, 2018.
Shree R. et. al., KAFKA: The modern platform for data management and analysis in big data domain, in Proceedings of the 2nd International Conference on Telecommunication and Networks (TEL-NET), 2017.
S. A. Noghabi, K. Paramasivam, Y. Pan, N. Ramesh, J. Bringhurst, I. Gupta, and R. H. Campbell. Samza: Stateful Scalable Stream Processing at LinkedIn. Proc. VLDB Endow., 10(12):1634–1645, Aug. 2017.
Confluence. Retrieved: April 2020 https://www.atlassian.com/ru/software/confluence
LinkedIn. Retrieved: April 2020 https://www.linkedin.com
Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas 45 Saha, Carlo Curino, Owen O'Malley, Sanjay Radia, Benjamin Reed, Eric Baldeschwieler: Apache Hadoop YARN: yet another resource negotiator. SoCC 2013:5
Jordan MI, Mitchell TM Machine learning: Trends, perspectives, and prospects, Science 349(6245), 2015, pp. 255–260.
SAS. Retrieved: April 2020 https://www.sas.com
Google. Retrieved: April 2020 https://about.google
Microsoft. Retrieved: April 2020 https://www.microsoft.com
Facebook. Информация о компании. Retrieved: June 2019. https://newsroom.fb.com/company-info
K. Hazelwood, S. Bird, D. Brooks, S. Chintala, U. Diril, D. Dzhulgakov, M. Fawzy, B. Jia, Y. Jia, A. Kalro, J. Law, K. Lee, J. Lu, P. Noordhuis, M. Smelyanskiy, L. Xiong, and X. Wang, “Applied machine learning at facebook: A datacenter infrastructure perspective,” in Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA), 2018.
Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan, Kim Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, Tommer Leyvand, Hao Lu, Yang Lu, Lin Qiao, Brandon Reagen, Joe Spisak, Fei Sun, Andrew Tulloch, Peter Vajda, Xiaodong Wang, Yanghan Wang, Bram Wasti, Yiming Wu, Ran Xian, Sungjoo Yoo∗ , Peizhao Zhang, Machine Learning at Facebook: Understanding Inference at the Edge, Facebook, Inc., 2019
Introducing FBLearner Flow: Facebook’s AI backbone. Retrieved: June 2019. https://code.fb.com/core-data/introducing-fblearner-flow-facebook-s-ai-backbone
Open Source Search & Analytics • Elasticsearch | Elastic. Retrieved: June 2019. https://www.elastic.co
TensorFlow Extended (TFX) is an end-to-end platform for deploying production ML pipelines. Retrieved: May 2020. https://www.tensorflow.org/tfx
TensorFlow - An end-to-end open source machine learning platform. Retrieved: May 2020. https://www.tensorflow.org/
Sara Landset, Taghi M. Khoshgoftaar, Aaron N. Richter, and Tawfiq Hasanin. 2015. A survey of open source tools for machine learning with big data in the Hadoop ecosystem. Journal of Big Data 2, 1 (2015), 24
Jimmy J. Lin and Alek Kolcz. 2012. Large-scale machine learning at twitter. In SIGMOD. 793–804.
Evan R. Sparks, Shivaram Venkataraman, Tomer Kaftan, Michael J. Franklin, and Benjamin Recht. 2016. KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics. CoRR abs/1610.09451 (2016).
D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, JeanFran ̧cois Crespo, and Dan Dennison. 2015. Hidden Technical Debt in Machine Learning Systems. In NIPS. 2503–2511.
Yann Dauphin, Razvan Pascanu, C ̧ aglar G ̈ul ̧cehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio. 2014. Identifying and attacking the saddle point problem in high- dimensional non-convex optimization. CoRR abs/1406.2572 (2014).
Apache Beam: An Advanced Unified Programming Model. Retrieved: June 2019. https://beam.apache.org
Sinno Jialin Pan and Qiang Yang. 2010. A Survey on Transfer Learning. IEEE Trans. on Knowl. and Data Eng. 22, 10 (Oct. 2010), 1345–1359.
Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks?. In NIPS. 3320–3328.
Running your models in production with TensorFlow Serving. Retrieved: June 2019 https://ai.googleblog.com/2016/02/running-your-models-in-production-with.html
Jeremy Hermann and Mike Del Balso. Meet Michelangelo: Uber’s machine learning platform. https://eng.uber.com/michelangelo, 2017. [Online; accessed 14-April-2019].
Jupyter Notebook. Retrieved: June 2019 https://jupyter.org
Databricks Inc. Retrieved: June 2019 https://databricks.com
M. Zaharia, A. Chen, A. Davidson, A. Ghodsi, S. Hong, A. Konwinski, S. Murching, T. Nykodym, P. Ogilvie, M. Parkhe, F. Xie, and C. Zumar. Accelerating the machine learning lifecycle with MLflow. IEEE Data Engineering Bulletin, 41(4), 2018.
Klaus Greff, Aaron Klein, Martin Chovanec, Frank Hutter, and Jurgen Schmidhuber. The Sacred Infrastructure for Computational Research. In Katy Huff, David Lippa, Dillon Niederhut, and M. Pacer, editors, Proceedings of the 16th Python in Science Conference, pages 49 – 56, 2017.
M. Vartak, H. Subramanyam, W.-E. Lee, S. Viswanathan, S. Husnoo, S. Madden, and M. Zaharia. Modeldb: A system for machine learning model management. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics, HILDA ’16, pages 14:1–14:3, New York, NY, USA, 2016. ACM.
Google. Tensorboard: Visualizing learning. Retrieved: June 2019 https://www.tensorflow.org/guide/summaries_and_tensorboard.
Git. Retrieved: June 2019. https://git-scm.com 47
GitHub. Retrieved: June 2019. https://github.com
Sklearn. Retrieved: June 2019. https://scikit-learn.org/stable
Zhang YY, Jiao YQ. Design and Implementation of Predictive Model Markup Language Interpretation Engine. 2015 International Conference on Network and Information Systems for Computers (ICNISC). 2015:527–31. 10.1109/Icnisc., 2015.105.
Data Mining Group. Retrieved: June 2019. http://dmg.org
Pmml examples. Retrieved: May 2020. http://dmg.org/pmml/pmml_examples/index.html,
Library for serialization and deserialization PMML models. Retrieved: May 2020. https://github.com/nyoka-pmml/nyoka
Spark.mlib library. Retrieved: May 2020. https://spark.apache.org/mllib/
Flink-jpmml library. Retrieved: May 2020.
https://github.com/FlinkML/flink-jpmml
Java PMML API. Retrieved: May 2020. https://github.com/jpmml
Tensorflow, saved model. Retrieved: June 2019. https://github.com/tensorflow/tensorflow/tree/master/tensorflow/python/saved_model
Protobuf. Retrieved: June 2019. https://developers.google.com/protocol-buffers/?hl=ru
ONNX. Retrieved: June 2019. https://onnx.ai
ONNX libraries. Retrieved: May 2020. https://github.com/onnx/tutorials
ONNX Model Zoo. Retrieved: May 2020. https://github.com/onnx/models
MLeap. http://mleap-docs.combust.ml, Retrieved: June 2019.
J. Pivarski, C. Bennett and RL. Grossman, “Deploying Analytics with the Portable Format for Analytics (PFA)”, Proceedings of the International Conference of Knowledge Discovery and Data Mining, (2016)
PFA description. Retrieved: May 2020. http://dmg.org/pfa/docs/motivation/
Plase D., Niedrite L., Taranovs R. Comparison of HDFS compact data formats: Avro Versus Parquet // Mokslas-Lietuvos ateitis. 2017. No. 9. P. 267-276.
Titus 2 - Portable Format for Analytics (PFA) implementation for Python 3.5+. Retrieved: May 2020. https://github.com/animator/titus2
How to Easily Deploy Machine Learning Models Using Flask. Retrieved: May 2020. https://towardsdatascience.com/how-to-easily-deploy-machine-learning-models-using-flask-b95af8fe34d4
Python serialization format – pickle. Retrieved: May 2020. https://docs.python.org/3/library/pickle.html
Docker. Retrieved: May 2020. https://www.docker.com/
Nginx. Retrieved: May 2020. https://nginx.org/ru/
Apache Tomcat. Retrieved: May 2020. http://tomcat.apache.org/
Implement RESTful Web Service using Java. Retrieved: May 2020. https://habr.com/ru/post/150034/
TensorFlow - Serving Models. Retrieved: May 2020. https://www.tensorflow.org/tfx/guide/serving
Implement RESTful Web Service using Java. Retrieved: May 2020. https://habr.com/ru/post/150034/
TensorFlow - Serving Models. Retrieved: May 2020. https://www.tensorflow.org/tfx/guide/serving
Relise of TensorFlow Serving as an open source tool for serving machine learning model in production. Retrieved: May 2020. https://ai.googleblog.com/2016/02/running-your- models-in-production-with.html,
ONNX to TensorFlow SavedModel. Retrieved: May 2020. https://github.com/onnx/onnx- tensorflow/issues/490
PyTorch. Retrieved: May 2020. https://pytorch.org
GRPC. Retrieved: May 2020. https://grpc.io/
Remote call procedure, wikipedia. Retrieved: May 2020. https://en.wikipedia.org/wiki/Remote_procedure_call
Kubeflow. Retrieved: May 2020. https://www.kubeflow.org/
Seldon.io. Retrieved: May 2020. https://www.seldon.io/tech/
Hydrosphere.io. Retrieved: May 2020. https://hydrosphere.io/
Kubernates. Retrieved: May 2020. https://kubernetes.io/
Boris Lublinsky. Serving Machine Learning Models. O'Reilly Media, Inc. 2017.
PMML model export - RDD-based API. Retrieved: May https://spark.apache.org/docs/latest/mllib-pmml-model-export.html, Retrieved: 2020.
Looking under the hood of pipelines. Retrieved: May https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/libs/ml/pipelines.html
Release Notes - Flink 1.9. Retrieved: May https://github.com/apache/flink/blob/master/docs/release-notes/flink-1.9.md
LIP-39 Flink ML pipeline and ML libs. Retrieved: May 2020. https://cwiki.apache.org/confluence/display/FLINK/FLIP- 39+Flink+ML+pipeline+and+ML+libs
Twiter. Retrieved: May 2020. https://twitter.com/
Ирисы Фишера. Википедия. Retrieved: May 2020. https://ru.wikipedia.org/wiki/%D0%98%D1%80%D0%B8%D1%81%D1%8B_%D0%A4%D0%B8%D1%88%D0%B5%D1%80%D0%B0
Prometheus.io. Retrieved: May 2020. https://prometheus.io/
Grafana.io. Retrieved: May 2020. https://grafana.com/
Asynchronous I/O Apache Flink. Retrieved: May 2020. https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/asyncio.html
Machine learning model serving system for event streams. Retrieved: May 2020. https://github.com/axreldable/msu-diploma-thesis
Refbacks
- There are currently no refbacks.
Abava Кибербезопасность MoNeTec 2024
ISSN: 2307-8162