45 followers 0 articles/week
Integrating Kafka and Spark Streaming: Code Examples and State of the Game

Spark Streaming has been getting some attention lately as a real-time data processing tool, often mentioned alongside Apache Storm. If you ask me, no real-time data processing tool is complete without Kafka integration (smile), hence I added an example Spark Streaming application to kafka-storm-starter that demonstrates how to read from Kafka and write to...

Wed Oct 1, 2014 22:41
Apache Storm 0.9 training deck and tutorial

Today I am happy to share an extensive training deck on Apache Storm version 0.9, which covers Storm’s core concepts, operating Storm in production, and developing Storm applications. I also discuss data serialization with Apache Avro and Twitter Bijection. The training deck (130 slides) is aimed at developers, operations, and architects. What...

Mon Sep 15, 2014 13:47
Apache Kafka 0.8 training deck and tutorial

Today I am happy to share an extensive training deck on Apache Kafka version 0.8, which covers Kafka’s core concepts, operating Kafka in production, and developing Kafka applications. I also discuss data serialization with Apache Avro and Twitter Bijection. The training deck (120 slides) is aimed at developers, operations, and architects. What...

Thu Aug 21, 2014 00:05
Apache Kafka: Beyond the Basics

TODO: Write intro Further readings Basics Official docs 0.8 Producer example Building LinkedIn’s Real-time Activity Data Pipeline (PDF), paper by LinkedIn engineers, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2012 Producing and consuming data Understanding the Kafka Async Producer, Gnip, November...

Thu Jun 5, 2014 11:33
Storm configuration settings

TODO: Write intro Configuration notes Storm Parallelism settings conf.setNumWorkers(n) Equivalent to TOPOLOGY_WORKERS. Number of worker processes to use to execute the topology. For example, if you set this to 25, there will be 25 Java processes across the cluster executing all the tasks. If you had a combined 150 parallelism across all components...

Thu Jun 5, 2014 11:33
Integrating Kafka and Storm: Code Examples and State of the Game

The only thing that’s even better than Apache Kafka and Apache Storm is to use the two tools in combination. Unfortunately, their integration can and is still a pretty challenging task, at least judged by the many discussion threads on the respective mailing lists. In this post I am introducing kafka-storm-starter, which contains many code examples...

Wed May 28, 2014 23:32

Build your own newsfeed

Ready to give it a go?
Start a 14-day trial, no credit card required.

Create account