DATA CLEANING ideas THEY DO NOT want you to learn about - Week 3 - IAT 461

Опубликовано: 03 Июнь 2026
на канале: SFU Data & Dialogue Lab
69
0

Data and Data Cleaning Basics

This lecture introduces core data concepts and practical data-cleaning ideas for data science with examples from anime and gaming. It contrasts structured data (tables with variables, observations, and values) with unstructured data (text, images, audio) and shows how to create structure from unstructured sources. It explains tidy data by comparing wide vs long formats and demonstrates transforming tables using operations like melt, enabling easier filtering, grouping, and averaging. The session reviews common data formats (JSON, XML, Markdown) and tools (Python/pandas, visualization libraries, R, Excel/Google Sheets, notebooks) with an emphasis on reproducibility. It surveys data collection methods including open datasets, proprietary data, APIs (with an OpenStreetMap example), and web scraping/spidering (Beautiful Soup and cautions about rate limits/blocks), plus logging user interactions and sensor data. Data cleaning covers “garbage in, garbage out,” preserving raw data, distinguishing errors vs artifacts, identifying outliers (IQR, Z-score) with examples, handling missing values, normalizing strings, and unifying time zones and units.

THERE IS NO REAL ONE PIECE CONTENT HERE. The data is used is related to make the concepts more interesting :).

This course is offered by Dr. Alireza Karduni from school of Interactive Arts and Technology from Simon Fraser University.

For more information visit
https://datadialogue.vercel.app/
https://www.sfu.ca/siat.html

00:00 Course Logistics
01:02 What Is Data
01:40 Structured vs Unstructured
03:08 Tables and Tidy Data
04:31 Wide vs Long Format
07:59 Web Data Formats
11:06 Unstructured Data Examples
13:11 Tools for Data Work
17:17 Notebooks and Reproducibility
18:45 Where Data Comes From
19:46 APIs and Web Scraping
25:16 Logging and Sensors
26:51 Data Cleaning Basics
27:44 Artifacts and Errors
30:33 Outliers and Detection
35:23 Missing Values and Fixes
36:52 String and Unit Normalization
38:53 Wrap Up and Q&A

#datascience #dataanalysis #onepiece #jjk #slaythespire