About the Talk:
Building a scalable data platform is a lot more than writing models and creating pipelines. This talk traces the evolution of a data pipeline
from its early struggles to a high-performing, scalable system. Initially, the team faced unreliable pipelines, frequent manual interventions,
and limited production support. The team had to learn quickly and focus on making small incremental improvements. Through the lens of
these early lessons, we discovered how small changes—such as better testing framework, documentation and monitoring—could
compound over time, driving massive long-term impact. As the system scales, these small improvements will act as a foundation for the
petabyte scale data platform.
About the Speaker:
1. Anay Nayak - Solution Consultant, Sahaj Software
Anay Nayak is a seasoned technology leader with over nineteen years of experience driving innovation and success in the design and delivery of large-scale enterprise projects across diverse domains. Over the last 5+ years, he has been actively working on building data platforms and integrating data science models to deliver reliable and actionable business insights.
2. Amaan Shaikh - Solution Consultant, Sahaj Software
Amaan is a problem solver with 3 years of experience. He started as an individual contributor building interactive systems for business operations and transitioned into building scalable data applications. He has worked on technologies like Spark, Airflow, Scala and Python gaining experience in designing, optimizing and ensuring the reliability of complex data workflows
Chapters:
00:00 Introduction
01:50 Setting the Stage - Problem Statement
08:28 The Dark Age
20:10 Feudal Age
32:20 Castle Age
46:07 Imperial Age
57:06 Key Takeaways
59:47 Q&A