Flink Forward San Francisco, April 2018 #flinkforward
Building a scalable focused web crawler with Flink - Ken Krugler
Is it possible to build an efficient, focused web crawler using Flink? That was the question that led to the creation of the flink-crawler open source project. In this talk I’ll discuss how we use Flink’s support for AsyncFunctions and iterations to create a scalable web crawler that continuously and efficiently performs a focused web crawl with no additional infrastructure. I’ll also discuss some of the testing and debugging challenges encountered when using features such as AsyncFunctions and iterations.
https://data-artisans.com/