Download PDFOpen PDF in browserConstruction Scheme of a Scalable Distributed Stream Processing Infrastructure Using Ray and Apache Kafka10 pages•Published: March 13, 2019AbstractThe spread of various sensors and the development of cloud computing technologies en- able the accumulation and use of large numbers of live logs in ordinary homes. To operate a service that utilizes sensor data, it is difficult to install servers and storage in ordinary homes and to analyze the collected data from sensors. Those data are typically transmitted from sensors to a cloud and analyzed in the cloud. However, services that involve moving image analysis must transfer large amounts of data continuously and require high computing power for analysis. Hence, it is highly difficult to process them in real time in the cloud using a conventional stream data processing framework. In this research, we propose a construction scheme for a highly efficient distributed stream processing infrastructure that enables scalable processing of moving image recognition tasks according to the amount of data that are transmitted from sensors. We implement a prototype system of the proposed distributed stream processing infrastructure using Ray and Apache Kafka, which is a distributed messaging system, and we evaluate its performance. The experimental results demonstrate that the proposed distributed stream processing infrastructure is highly scalable.Keyphrases: apache kafka, apache spark, cloud, distributed stream processing, ray In: Gordon Lee and Ying Jin (editors). Proceedings of 34th International Conference on Computers and Their Applications, vol 58, pages 368-377.
|