What I Read This Week

Hidden Markov Models in Election Polling.

Bias and Excess Variance in Election Polling:

  1. Overview
    • Interested in Polling Errors
    • The poll errors are extremely time sensitive to the time between the poll and the actual election outcome
    • propose a hidden Markov model to capture time varying preferenes and treat the election results as a peak at the typically hidden process.
    • CLAIM: Their solution is much less sensitive to time window, avoids conflating errors, and are interpretable
    • Compare their model to an already established 2018 paper by Shirani-Mehr et all, which is a linear model as well as a simple non-intercept nor time dependent distribution model.
    • MAIN ISSUE: The methods are inconsistent across many inclusion windows, ie if your support is overstated by polls that factor changes over how many days of polling. support overstated = X days polled
  2. Model Issues
    • Mislabel changes in preferences as polling errors with high precision because they dont account for model misspessification [1]
    • Certain implementations also require log & logit transformations which can cause directional error Model Advantages 1. Consistency across time windows 2. Avoid conflating changes in preferences with polling errors 3. Interpretability
  3. Models
    • Specification (a) i = poll (b) ri = election (c) yi = Proportion of Sample Intending to Vote Republican out of Republican/Democrat (d) ni = number of two party voters (e) vri = Republican portion of the two party vote (f) ASSUME: yi ∼ N (pi, σ2 i ) (g) THE MODELS DIFFER BASED ON HOW pi THE TRUE UNDERLYING PREFERENCE IN Poll i IS DECOMPOSED
    • M1: Static Model considering (a) pi := vri + αri (b) yi − vri ∼ N (αri, σ2 i ) (c) This is assuming the electorates preferences don’t change over time and the error, αri is time-invariant election specific error. This also means you just have to take a poll very close.
    • M2: A Linear Model (a) logit(pi) = logit(vri) + αri + βriti (b) This error is now defined as the sum of the time-invariant erorr plus the linear model error.
    • Random Walk Model
      • Specification
        1. The proposed model sets pi = θri,ti + αri , where θrt represents the electorates preferences at timepoint t, and evolves via a reverse random walk process ie not from 0 →T but instead T →0. θr,t+1 ∼ N (θr,t, γ2 t ) Therefore, the election results reveal θr0 := vr
        2. Formal State
        3. yi ∼ N (pi, pi(1 − pi) ni + τ 2 ri ) pi = min(max(0, (θri + αri )) θr,t+1 ∼ N (θr,t, γ2 t ) θr0 := vr
        4. The τr is the election specific variance above random sampling. γr is the measure of how the electorates preferences can change from day to day. We assume that 95% of daily shifts will be within plus or minus 2γr. αr directly measures poll biases. Th operator in pi ensure that we lie within 0,1.
        5. The election specific scalar parameters τr, αrγr have hierarchical normal or half normal priors placed to borrow strength across elections.
        6. τr ∼ N+(0, σ2 τ ) 2 αr ∼ N (μα, σ2 α) γr ∼ N (0, σ2 γ )
        7. Weakly informative priors are placed on the hyper parameters. The model was estimated using STAN and Hamiltonian Monte Carlo. Estimation
        8. With a normal prior for αr it obtained: αr|yi, τr, γr ∼ N  wi(yi − vr) + (1 − wi)μα, λ−1 i  λi = (τ 2 + tiγ2 r )−1 + σ−2 α wi = (τ 2 + tiγ2 r )−1 λi effectively, the model obtains a weight given to the observed error and a 1-w weight to the prior mean. A poll farther will have less weight than one closer to the elction.
  4. Summary
  5. This model better captured turbulent election cycles, and has better interpretable results based on the weights. It also measures the errors that are robust to selective inclusion about how many to include