Download PDFOpen PDF in browser

Construction Scheme of a Scalable Distributed Stream Processing Infrastructure Using Ray and Apache Kafka

10 pagesPublished: March 13, 2019

Abstract

The spread of various sensors and the development of cloud computing technologies en- able the accumulation and use of large numbers of live logs in ordinary homes. To operate a service that utilizes sensor data, it is difficult to install servers and storage in ordinary homes and to analyze the collected data from sensors. Those data are typically transmitted from sensors to a cloud and analyzed in the cloud. However, services that involve moving image analysis must transfer large amounts of data continuously and require high computing power for analysis. Hence, it is highly difficult to process them in real time in the cloud using a conventional stream data processing framework. In this research, we propose a construction scheme for a highly efficient distributed stream processing infrastructure that enables scalable processing of moving image recognition tasks according to the amount of data that are transmitted from sensors. We implement a prototype system of the proposed distributed stream processing infrastructure using Ray and Apache Kafka, which is a distributed messaging system, and we evaluate its performance. The experimental results demonstrate that the proposed distributed stream processing infrastructure is highly scalable.

Keyphrases: apache kafka, apache spark, cloud, distributed stream processing, ray

In: Gordon Lee and Ying Jin (editors). Proceedings of 34th International Conference on Computers and Their Applications, vol 58, pages 368-377.

BibTeX entry
@inproceedings{CATA2019:Construction_Scheme_Scalable_Distributed,
  author    = {Kasumi Kato and Atsuko Takefusa and Hidemoto Nakada and Masato Oguchi},
  title     = {Construction Scheme of a Scalable Distributed Stream Processing Infrastructure Using Ray and Apache Kafka},
  booktitle = {Proceedings of 34th International Conference on Computers and Their Applications},
  editor    = {Gordon Lee and Ying Jin},
  series    = {EPiC Series in Computing},
  volume    = {58},
  publisher = {EasyChair},
  bibsource = {EasyChair, https://easychair.org},
  issn      = {2398-7340},
  url       = {/publications/paper/LFCL},
  doi       = {10.29007/8lbk},
  pages     = {368-377},
  year      = {2019}}
Download PDFOpen PDF in browser