Agenda for Studio 2

  • Week 1-3: Data Extraction and Basics

Week 1: Introduction to SQL - Basic commands (SELECT, INSERT, UPDATE, DELETE).

Week 2: Advanced SQL: Joins, subqueries. Pandas in Python: Introduction, Data manipulation.

Week 3:Pandas: Filtering, aggregation. Introduction to Web Scraping with Scrapy.

  • Week 4-6: Data Cleaning and Preprocessing

Week 4: Handling Missing Data: Techniques like imputation, deletion. Data Transformation: Introduction to Normalization, scaling.

Week 5: Regular Expressions (Regex) for Data Cleaning. Continued Data Transformation Techniques.

Week 6: Data Cleaning with NumPy and Pandas: Practical examples. Sample notebooks on Regex and data cleaning.

  • Week 7-9: Data Modeling: Supervised Learning

Week 7: Linear Regression - Basics and assumptions.

Week 8: Classification Techniques: Logistic regression, decision trees. Regularization: Lasso and Ridge regression.

Week 9: Tree-Based Models: Decision trees, Random Forests. Brief comparison of XGBoost and LGBM.

  • Week 10-11: Model Explainability

Week 10: Complex Models vs. Interpretability. Introduction to SHAP Values.

Week 11: Feature Importance Techniques. Case Studies on model interpretability.

  • Week 12: Experiment Tracking and Management

Week 12: Introduction to MLflow and its components. Tracking Experiments with MLflow.

  • Week 13-15: Model Deployment

Week 13: Introduction to Docker for data science. Basics of building interactive apps with Streamlit/Gradio.

Week 14: Deployment Best Practices. Version control and CI/CD concepts.

Week 15: Deploying a simple model using Docker. Creating interactive demos with Streamlit/Gradio.