How to Document Your ETL Pipeline for a Research Analytic Dataset

Опубликовано: 29 Сентябрь 2024
на канале: Monika Wahi
900
164

*Note: I may be compensated, but you will not be charged, if you click on the links below.
💡 This video is part of the Public Health to Data Science Rebrand Program. Sign up for a 30-minute Zoom market research interview about this new program here: https://buff.ly/3UnLqmq

WANT TO SUPPORT MONIKA ON SOCIAL MEDIA?
❤️Sign up for Monika’s weekly data science e-newsletter: https://buff.ly/2UYW60l
🧡Follow/connect with Monika on LinkedIn:   / dethwench  
💛Follow Monika on Mastodon: https://fosstodon.org/@dethwench
💚Try Monika’s courses on LinkedIn Learning: https://buff.ly/2Ihd4Rq
💙Try Monika's boutique research methods and data science courses here: https://buff.ly/3zM243P

Theme music by CJ Hutchings, used with permission: https://dethwench.com/cjhutchings/
Buy Monika’s book, “Mastering SAS Programming for Data Warehousing”: https://buff.ly/31Hz1mg
Army MOS codes:
https://buff.ly/3Z7CPWj
SAS Data Integration Studio support page with tutorials:
https://buff.ly/41bySBP
Code/files on GitHub: https://mailchi.mp/1b4ce5c74c09/etlbe...
Blog: https://buff.ly/3YZws7Z

Timestamps and links as they come up are below:
00:18 Topics covered
01:34 What is an “ETL pipeline?” Definition of E in ETL (Extract).
03:32 Definition of T (Transform)
04:21 Definition of L (Load)
05:31 Why document an ETL pipeline for a research study
07:13 ..or for a data system
08:07 Minimum necessary ETL pipeline documentation
09:16 Review of directory set-up: Analytic environment
12:57 Crosswalk folder
13:32 Original and final datasets folder
13:55 Reviewing variables in native extract
17:26 ASVAB conundrum
20:59 Presenting data dictionary
27:39 Picklists in data dictionary
29:35 View analytic dataset
31:38 Review code
33:40 ETL pipeline diagram
35:36 Example of deidentified source data