How to Do Exploratory Data Analysis the Right Way (Stage 5/10)

Опубликовано: 22 Май 2026
на канале: Ibrahim Kocyigit
72
1

Exploratory Data Analysis is easy when you're not guessing. Every EDA tutorial starts with "let's download a dataset and see what we find." Ours starts with four stages of planning. And that changes everything.

This is Stage 5 of a 10-part series where I build a complete data science project following the Foundational Methodology for Data Science by John B. Rollins — one video per stage, from the first client meeting to a deployed model.

In Stage 1, we met with stakeholders and defined the business problem. In Stage 2, we chose our models and metrics. In Stage 3, we wrote the data shopping list. In Stage 4, we set up the project and built the ELT pipeline. By the time we open the data in this video, we already know what we're looking for, why we're looking for it, and what "good enough" means.

That's the whole point: Stages 1–4 turned EDA from "explore and pray" into a checklist. We knew to check class balance (because precision matters for our business problem). We knew to examine feature distributions (because our candidate models assume different things). We knew which features to focus on and which to drop.

But the plan didn't cover everything. The data had surprises — and those surprises will shape our next stages.

In this video (Stage 5 of 10): Data Understanding
→ Why methodology makes EDA structured instead of aimless
→ First look: shape, types, duplicates, and conflicting feature combinations
→ Missing value assessment (short section — but we explain why that matters)
→ Descriptive statistics: central tendency, spread, coefficient of variation
→ Class balance: 48.2% versicolor vs. 51.8% non-versicolor — and what that means for our precision-first strategy
→ Univariate analysis: histograms, density plots, and box plots for every feature
→ Bivariate & multivariate analysis: correlations, pair plots, and visual separability
→ The surprises: ~55% non-unique rows, 1,759 conflicting feature combinations, and irreducible noise
→ What all of this means for Stage 6 (Data Preparation)
→ First video in the series with real visualizations — matplotlib, seaborn, and actual insights

---

📚 My full DS & ML repository (methodology + math + ML + MLOps):
→ https://github.com/ibrahim-kocyigit/k...

📚 Project repository:
→ https://github.com/ibrahim-kocyigit/i...

📄 Stage 5 Template:
→ https://github.com/ibrahim-kocyigit/k...

📄 Stage 5 Notebook:
→ https://github.com/ibrahim-kocyigit/i...

🌐 Dataset (50K Synthetic Iris by Dr. Ray Islam):
→ https://www.kaggle.com/datasets/drray...

---

🕐 Timestamps:
0:00 — Intro & Recap (Stages 1–4)
3:01 — What is the Data Understanding stage?
9:32 — Step 1: First Look
28:10 — Step 2: Missing Value Assessment
29:20 — Step 3: Descriptive Statistics
33:50 — Step 4: Univariate Analysis
39:44 — Step 5: Bivariate Analysis
49:41 — Step 6: EDA Findings Summary
57:27 — What's next: Stage 6 — Data Preparation

---

Previous videos:
Stage 1 — Business Understanding →    • How a Freelance Data Scientist Starts a Re...  
Stage 2 — Analytic Approach →    • How a Freelance Data Scientist Picks the R...  
Stage 3 — Data Requirements →    • Before You Touch the Data — Data Science D...  
Stage 4 — Data Collection →    • First Code in a Data Science Project — And...  

Next video: Stage 6 — Data Preparation (where we clean, transform, and engineer features based on everything we learned in this EDA — the data finally gets ready for modeling).

💬 Question for you: Do you do a full EDA before modeling — or do you just .describe() and move on? (Be honest.)

#DataScience #MachineLearning #CRISPDM #Methodology #IrisDataset #MLOps #FreelanceDataScience #DataScienceProject #EDA #ExploratoryDataAnalysis #DataVisualization #Seaborn #Pandas