The Real-time analytics is a new trend in Big Data technologies, and usually has significant business effect. The commonly used architecture for real time analytics at scale is based on Spark Streaming and Kafka. However, combining these technologies together at high scale you can find yourself searching for the solution that covers more complicated production use-cases.
In this talk I will present my solution for combining Spark Streaming and Kafka without data loss on Spark jobs restart. I will show my steps towards the solution, describing the affect of each step. Finally I will present the working solution.
Understanding Spark Streaming with Kafka and Druid
Full Talk (40 Minutes)
Food & Swag Sponsors
Learn more about each of our Event Sponsors.