Open Source Packages and NoSQL Now Conference in San Jose: Part 2

This is a continuation from Part 1.

In this blog, I will discuss Flink and Samza.

Flink

Vladimir Bacvanski of SciSpike (leading training and consulting firm specializing in advanced object-oriented and enterprise technologies according to their web site) gave a talk on Flink, which is another streaming and batch dataflow engine.

Its features are briefly documented on its project page:

  • Ÿ   High performance
  • Ÿ   Exactly-once Semantics for Stateful Computations
  • Ÿ   Continuous Streaming Model with Flow Control
  • Ÿ   Fault-tolerance via Lightweight Distributed Snapshots
  • Ÿ   One Runtime for Streaming and Batch Processing
  • Ÿ   Memory Management
  • Ÿ   Iterations and Delta Iterations
  • Ÿ   Program Optimizer
  • Ÿ   Batch Processing Applications
  • Ÿ   Streaming Data Applications
  •    Library Ecosyste
  •    Broad Integration

Flink was originally developed in 2009 in Germany at the Technical University of Berlin and became an Apache Incubator project in April 2014. In January 2015, Flink was accepted as an Apache top-level project. Flink is very new, and many wonder what it is and how it can compete in the increasingly crowded streaming-dataflow engine market. A few articles talk about it (here and here). Other than its project pages, there is not much information available. Data Artisan is a commercial company that uses Flink, but unfortunately even Data Artisan has little information about it.

Flink is available here. If you need more detailed information, here’s a good slide presentation.

It is rather hard to know which package to use when it comes to streaming analytics. I will cover that in a separate blog.

Samza

There was no session on Samza at the NoSQL conference, but it was referenced by several speakers. According to its project page, Apache Samza is a distributed stream processing framework. It uses Apache Kafka for messaging and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management.

Originally, Samza was developed at LinkedIn (this site contains a lot of interesting posts related to web-scale data processing) and later made open source at Apache. It entered the Apache Incubator in September 2013 and became a top-level project in January 2014. A video presentation is available here and a short description is here.

Again, it is difficult to know which package to choose for streaming processing. I will touch on this issue in a separate blog.

Zen Kishimoto

About Zen Kishimoto

Seasoned research and technology executive with various functional expertise, including roles in analyst, writer, CTO, VP Engineering, general management, sales, and marketing in diverse high-tech and cleantech industry segments, including software, mobile embedded systems, Web technologies, and networking. Current focus and expertise are in the area of the IT application to energy, such as smart grid, green IT, building/data center energy efficiency, and cloud computing.

, , , ,

No comments yet.

Leave a Reply


*