Agenda for Studio 2
- Week 1-3: Data Extraction and Basics
Week 1: Introduction to SQL - Basic commands (SELECT, INSERT, UPDATE, DELETE).
Week 2: Advanced SQL: Joins, subqueries. Pandas in Python: Introduction, Data manipulation.
Week 3:Pandas: Filtering, aggregation. Introduction to Web Scraping with Scrapy.
- Week 4-6: Data Cleaning and Preprocessing
Week 4: Handling Missing Data: Techniques like imputation, deletion. Data Transformation: Introduction to Normalization, scaling.
Week 5: Regular Expressions (Regex) for Data Cleaning. Continued Data Transformation Techniques.
Week 6: Data Cleaning with NumPy and Pandas: Practical examples. Sample notebooks on Regex and data cleaning.
- Week 7-9: Data Modeling: Supervised Learning
Week 7: Linear Regression - Basics and assumptions.
Week 8: Classification Techniques: Logistic regression, decision trees. Regularization: Lasso and Ridge regression.
Week 9: Tree-Based Models: Decision trees, Random Forests. Brief comparison of XGBoost and LGBM.
- Week 10-11: Model Explainability
Week 10: Complex Models vs. Interpretability. Introduction to SHAP Values.
Week 11: Feature Importance Techniques. Case Studies on model interpretability.
- Week 12: Experiment Tracking and Management
Week 12: Introduction to MLflow and its components. Tracking Experiments with MLflow.
- Week 13-15: Model Deployment
Week 13: Introduction to Docker for data science. Basics of building interactive apps with Streamlit/Gradio.
Week 14: Deployment Best Practices. Version control and CI/CD concepts.
Week 15: Deploying a simple model using Docker. Creating interactive demos with Streamlit/Gradio.