This is a continuation from Part 1.
Vladimir Bacvanski of SciSpike (leading training and consulting firm specializing in advanced object-oriented and enterprise technologies according to their web site) gave a talk on Flink, which is another streaming and batch dataflow engine.
Its features are briefly documented on its project page:
- High performance
- Exactly-once Semantics for Stateful Computations
- Continuous Streaming Model with Flow Control
- Fault-tolerance via Lightweight Distributed Snapshots
- One Runtime for Streaming and Batch Processing
- Memory Management
- Iterations and Delta Iterations
- Program Optimizer
- Batch Processing Applications
- Streaming Data Applications
- Library Ecosyste
- Broad Integration
Flink was originally developed in 2009 in Germany at the Technical University of Berlin and became an Apache Incubator project in April 2014. In January 2015, Flink was accepted as an Apache top-level project. Flink is very new, and many wonder what it is and how it can compete in the increasingly crowded streaming-dataflow engine market. A few articles talk about it (here and here). Other than its project pages, there is not much information available. Data Artisan is a commercial company that uses Flink, but unfortunately even Data Artisan has little information about it.
It is rather hard to know which package to use when it comes to streaming analytics. I will cover that in a separate blog.
There was no session on Samza at the NoSQL conference, but it was referenced by several speakers. According to its project page, Apache Samza is a distributed stream processing framework. It uses Apache Kafka for messaging and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management.
Originally, Samza was developed at LinkedIn (this site contains a lot of interesting posts related to web-scale data processing) and later made open source at Apache. It entered the Apache Incubator in September 2013 and became a top-level project in January 2014. A video presentation is available here and a short description is here.
Again, it is difficult to know which package to choose for streaming processing. I will touch on this issue in a separate blog.