Deep-Dive
Expect to justify every decision you made. Focus on the "Why" rather than just the "What."
• Problem Statement: What specific business or research gap were you filling?
• Data Pipeline: How did you handle missing values, outliers, or imbalanced classes?
• Model Selection: Why choose XGBoost over a Random Forest or a Neural Network for this specific data?
• Evaluation: Which metrics mattered most (e.g., Precision vs. Recall) and why?
2. Basic ML Algorithms
You should be able to explain these as if speaking to a non-technical stakeholder, then dive into the math.
• Linear/Logistic Regression: Understanding assumptions (linearity, homoscedasticity) and regularization (L_1 and L_2).
• Decision Trees & Ensembles: How entropy/Gini impurity works; the difference between Bagging and Boosting.
• K-Nearest Neighbors (KNN): The impact of distance metrics and the "curse of dimensionality."
3. Probability & Statistics
This is the "engine" under the hood of ML. Common topics include:
• Distributions: Knowing when to use Normal, Bernoulli, or Poisson distributions.
• Bayes' Theorem: Calculating conditional probability (often framed as a disease testing word problem).
• Hypothesis Testing: Defining p-values, Type I/II errors, and A/B testing logic.
• Central Limit Theorem: Why it allows us to assume normality in large samples.