Building a scalable focused web crawler with Flink - Ken Krugler

Опубликовано: 08 Октябрь 2024
на канале: Flink Forward
2,325
34

Flink Forward San Francisco, April 2018 #flinkforward

Building a scalable focused web crawler with Flink - Ken Krugler

Is it possible to build an efficient, focused web crawler using Flink? That was the question that led to the creation of the flink-crawler open source project. In this talk I’ll discuss how we use Flink’s support for AsyncFunctions and iterations to create a scalable web crawler that continuously and efficiently performs a focused web crawl with no additional infrastructure. I’ll also discuss some of the testing and debugging challenges encountered when using features such as AsyncFunctions and iterations.

https://data-artisans.com/