Praveen Chitrada presented this session "Building a fast, scalable, efficient operational analytics and reporting application using MemSQL, Docker, Airflow, and Prometheus" at Strata Data Conference 2019. Akamai transitioned to SingleStore to get much more efficient and performant ingest. Airflow tasks move data between clusters. Prometheus harvests Airflow and SingleStore metrics into Grafana dashboards. Local developers use a containerized representation of the system to locally run regression tests.
0:00 - Introducing Praveen Chitrada and Akamai
1:55 - Agenda
2:41 - Prism architecture overview
3:33 - Network statistics use-case overview
4:53 - Problem statement: need SLA for data ingest
6:45 - #MemSQL strengths and architecture
7:52 - POC Benchmarks of existing system
9:06 - Initial implementation metrics and trends
9:41 - MemSQL ingest with pipelines: #ELT without 3rd party tools
10:47 - Transform during ingest in a MemSQL pipeline with #Python
11:13 - No need to stop ingest for month-end processing
11:41 - #RowStore or #ColumnStore?
12:30 - RowStore is in memory, ColumnStore is on disk
12:53 - Shard correctly to avoid Table Skew
14:22 - We use a lot of temporary tables
14:49 - We only replicate metadata to secondary cluster for scale
15:52 - Airflow jobs on both clusters help move Active/Active
16:14 - #Airflow jobs overview
17:20 - Airflow workers are on MemSQL master aggregators
17:30 - Airflow graph shows job dependencies and job status
18:57 - Airflow tree view shows heat-map of job results
20:06 - Custom Airflow dashboard of MemSQL task status
21:14 - Dive straight into job logs
21:30 - Airflow dashboards make it easy to operationalize customer support
22:03 - Build custom Airflow plugins for custom dashboards to better show business metrics across regions
23:33 - Monitor clusters with Prometheus and Grafana
24:17 - Prometheus exporter on each machine in the cluster to harvest cluster metrics
24:51 - Send Prometheus alerts on unusual data
25:20 - Akamai's Grafana dashboard shows 7 days of MemSQL cluster events
25:50 - Analyze a CPU spike, double-click into Airflow task logs
26:32 - Grafana: MemSQL RowStore RAM usage
27:18 - Forecast maintenance based on anomalies
27:39 - Grafana: network usage overview shows backup traffic
28:31 - Grafana: disk I/O highlights outlier events
29:13 - Containerize Airflow and Prometheus workloads in #Docker
29:50 - We built an application development kit as a container set
30:30 - black-box testing with Robert framework
30:53 - Developers can run regression tests locally, speeds development
31:31 - Sample build output
32:17 - Future plans: add #Kafka, Flink, and Openshift
33:09 - Thank you, begin Q&A
33:30 - Can you monitor at the edge and catch fails in real-time?
34:12 - What open-source tools are used?
34:32 - Do you use Celery with Airflow?
35:03 - What is the backup strategy?
36:16 - Do you use Airflow task controllers?
36:50 - How do you use Spark?
37:18 - Why are you moving to Openshift?
37:53 - What data is available in the build?
38:36 - What is your performance difference between temp tables and joins?
#MemSQL is now #SingleStore