datawithstata

Posts

Showing posts from April, 2026

Module 8: Interpretation for Policy and Health Research

April 15, 2026

Module 8: Interpretation for Policy and Health Research STATA playlist Open the full YouTube playlist This final module focuses on translation rather than estimation. Students learn how to move from model output to clear, defensible statements for clinicians, public health practitioners, policymakers, and other non-technical audiences. The goal is to teach careful language: distinguish odds from risks, statistical significance from substantive importance, and model stability from policy relevance. Good interpretation also means acknowledging limitations such as convergence problems, uncertain fit, or the possibility that a statistically significant effect is still modest in practical terms. Interpretation checklist Name the outcome clearly. State whether the estimate is an odds ratio, risk ratio, probability, or marginal effect. Translate the estimate into plain language. Note important caveats about prevalence, model assumptions, and generalizability. Example phras...

Module 8: Interpretation for Policy and Health Research

April 15, 2026

Module 7: Choosing Between Logistic, Log-Binomial, Poisson, and Probit Models

April 15, 2026

Module 7: Choosing Between Logistic, Log-Binomial, Poisson, and Probit Models STATA playlist Open the full YouTube playlist This module teaches model choice as a practical decision rather than a theoretical contest. Your uploaded comparative notes distinguish logistic regression for odds ratios, log-binomial regression for direct risk ratios, Poisson regression with robust standard errors as a practical workaround, and probit regression as a latent-variable probability model. A particularly valuable part of your teaching notes is the real mistreatment example, where a log-binomial model was attempted but failed to converge. That example makes the teaching point clear: students must learn to balance interpretability, outcome prevalence, and whether the model actually behaves well in the data. Decision framework Logistic : stable and widely used when odds ratios are acceptable. Log-binomial : attractive for direct risk ratios, but convergence can be fragile. Poisson with...

Module 7: Choosing Between Logistic, Log-Binomial, Poisson, and Probit Models

April 15, 2026

Module 6: Odds Ratios vs Risk Ratios

April 15, 2026

Module 6: Odds Ratios vs Risk Ratios STATA playlist Open the full YouTube playlist This is a critical interpretation module for epidemiology and health research. The uploaded notes emphasize a practical rule: when the outcome is rare, odds ratios and risk ratios are close, but when the outcome is common, odds ratios can diverge substantially and overstate the apparent magnitude of association. The module also highlights why risk ratios are often easier to explain to clinicians and policymakers, while odds ratios arise naturally in logistic regression and remain mathematically convenient. Simple guide Rare outcomes: OR and RR are often similar. Common outcomes: divergence becomes important. Always report clearly whether you are talking about odds, risk, or probability. Example language Risk ratio : the event is 1.8 times as likely in one group as in another. Odds ratio : the odds of the event are 1.8 times as high in one group as in another.

Module 6: Odds Ratios vs Risk Ratios

April 15, 2026

Module 5: Logistic Regression (Core Module)

April 15, 2026

Module 5: Logistic Regression (Core Module) STATA playlist Open the full YouTube playlist Logistic regression is the central model in this course because it is the standard approach for binary outcomes. It estimates the probability that an event occurs while keeping predicted values within the 0 to 1 range, and it is commonly interpreted using odds ratios. This module explains the model in applied terms: what the coefficients mean, why exponentiating them gives odds ratios, and how to distinguish a change in odds from a change in probability. What students should understand Use logistic regression when the dependent variable is coded 0/1. logit reports coefficients in log-odds; logistic reports odds ratios directly. Odds ratios are not the same as risk ratios. STATA commands logit outcome x1 x2 x3 logistic outcome x1 x2 x3 predict phat margins

Module 5: Logistic Regression (Core Module)

April 15, 2026

Module 4: Regression with Dummy Variables

April 15, 2026

Module 4: Regression with Dummy Variables STATA playlist Open the full YouTube playlist Dummy variables are essential whenever categorical predictors appear in a model, including sex or gender, hospital type, race or ethnicity, treatment groups, or place of residence. This module explains how to include those predictors correctly and why one category must be omitted and treated as the reference group. Students learn the dummy variable trap, the logic of the baseline category, and how to interpret coefficients as differences relative to that reference group. Key points Use one fewer dummy than the total number of categories. The omitted category becomes the reference group. Interpretation is always relative to that baseline. STATA examples tab group, gen(d_) regress y x1 d_2 d_3 regress y i.group x1

Module 4: Regression with Dummy Variables

April 15, 2026

Module 3: Binary Outcomes and Why OLS Fails

April 15, 2026

Module 3: Binary Outcomes and Why OLS Fails STATA playlist Open the full YouTube playlist Health research often focuses on binary outcomes such as disease versus no disease, admitted versus not admitted, and survived versus did not survive. A central lesson from the logistic regression notes is that OLS can produce fitted values below 0 or above 1, which makes no sense when the quantity of interest is a probability. This module uses that problem to motivate the move from OLS to logistic and related models. The key point is practical: binary outcomes require a model that respects the probability scale. Why this matters Linear predictions can fall outside the range of valid probabilities. Binary outcomes violate the logic of a simple linear fit. This is why logistic regression is not optional but appropriate. STATA illustration regress hiqual avg_ed predict yhat logit hiqual avg_ed predict phat

Module 3: Binary Outcomes and Why OLS Fails

April 15, 2026

Module 2: Foundations of Regression (OLS)

April 15, 2026

Module 2: Foundations of Regression (OLS) STATA playlist Open the full YouTube playlist Ordinary least squares (OLS) is the starting point for most applied analysis because it provides a simple way to estimate linear relationships between an outcome and one or more predictors. In this course, OLS matters both as a method in its own right and as the baseline model students must understand before learning why binary-outcome models require a different framework. This module focuses on coefficient interpretation, the idea of a fitted line, and the logic of estimating expected changes in the dependent variable while holding other variables constant. Learning goals Interpret coefficients clearly. Understand what OLS is estimating. Recognize why OLS is the foundation for later modules. Core STATA command regress y x1 x2 x3

Module 2: Foundations of Regression (OLS)

April 15, 2026

Module 1: Introduction to Data, Variables, and STATA

April 15, 2026

Module 1: Introduction to Data, Variables, and STATA STATA playlist Open the full YouTube playlist This module introduces the building blocks of quantitative analysis: variable types, data structure, value labels, and basic STATA workflow. A recurring theme of the course is that the type of outcome variable determines the model you should use, so this module lays the conceptual groundwork for everything that follows. Students should be able to distinguish between continuous, categorical, binary, and count outcomes; understand why coding matters; and begin navigating STATA confidently. Key ideas Variable type shapes model choice. Labels and coding choices matter for interpretation. Clean setup in STATA makes later regression work easier. Starter commands describe codebook tab varname tab varname, nolabel summarize

Module 1: Introduction to Data, Variables, and STATA

April 15, 2026

Start Here: Applied Quantitative Methods for Health Research

April 15, 2026

Applied Quantitative Methods for Health Research This blog organizes teaching materials into a structured course for healthcare research graduate students, epidemiology students, and early-career scholars entering quantitative research. It combines short course summaries, STATA commands, and interpretation guidance built from the DATA with STATA site and your uploaded teaching notes. STATA playlist Open the full YouTube playlist How to use this course Read the module summary. Watch the playlist videos that match the topic. Run the STATA commands. Focus on interpretation and model choice, not just estimation. Course modules Module 1: Introduction to Data, Variables, and STATA Module 2: Foundations of Regression (OLS) Module 3: Binary Outcomes and Why OLS Fails Module 4: Regression with Dummy Variables Module 5: Logistic Regression Module 6: Odds Ratios vs Risk Ratios Module 7: Choosing Between Logistic, Log-Binomial, Poisson, and Probit Models Module 8: Inter...

Start Here: Applied Quantitative Methods for Health Research

April 15, 2026