Wykładniczy współczynnik regresji logistycznej inny niż iloraz szans

Jak rozumiem, wykładnicza wartość beta z regresji logistycznej jest ilorazem szans tej zmiennej dla zmiennej zależnej zainteresowania. Jednak wartość nie odpowiada ręcznie obliczonemu współczynnikowi szans. Mój model przewiduje stunting (miarę niedożywienia) przy użyciu, między innymi, ubezpieczenia.

// Odds ratio from LR, being done in stata
logit stunting insurance age ... etc. 
or_insurance = exp(beta_value_insurance)

// Odds ratio, manually calculated
odds_stunted_insured = num_stunted_ins/num_not_stunted_ins
odds_stunted_unins = num_stunted_unins/num_not_stunted_unins
odds_ratio = odds_stunted_ins/odds_stunted_unins

Jaki jest koncepcyjny powód, dla którego wartości te są różne? Kontrolowanie innych czynników regresji? Chcę tylko móc wyjaśnić tę rozbieżność.

regression logistic interpretation odds-ratio mikrofon
źródło

Czy umieszczasz dodatkowe predyktory w modelu regresji logistycznej? Ręcznie obliczony iloraz szans będzie pasował do ilorazu szans, który uzyskasz z regresji logistycznej, tylko jeśli nie uwzględnisz żadnych innych predyktorów.

Makro

Tak myślałem, ale chciałem potwierdzenia. Czy to dlatego, że wynik regresji uwzględnia zmienność innych predyktorów?

mike

Tak, @mike. Zakładając, że model jest poprawnie określony, można interpretować go jako iloraz szans, gdy wszystkie inne predyktory są stałe.

Makro

@Macro: czy mógłbyś powtórzyć swój komentarz jako odpowiedź?

jrennie

Odpowiedzi:

Jeśli umieścisz tylko ten samotny predyktor w modelu, wówczas iloraz szans między predyktorem a odpowiedzią będzie dokładnie równy współczynnikowi regresji wykładniczej . Nie sądzę, aby wyprowadzenie tego wyniku było obecne na stronie, więc skorzystam z okazji, aby go podać.

Rozważmy wynik binarny i pojedynczy predyktor binarny : $Y$ $X$

\begin{array}{ccc} Y = 1 & Y = 0 \\ X = 1 & p_{11} & p_{10} \\ X = 0 & p_{01} & p_{00} \end{array}

$\begin{array}{c|cc} \phantom{} & Y = 1 & Y = 0 \\ \hline X=1 & p_{11} & p_{10} \\ X=0 & p_{01} & p_{00} \\ \end{array}$

Następnie jeden sposób obliczenia ilorazu szans między i jest $X_i$ $Y_i$

O R = \frac{p_{11} p_{00}}{p_{01} p_{10}}

${\rm OR} = \frac{ p_{11} p_{00} }{p_{01} p_{10}}$

Z definicji prawdopodobieństwa warunkowego . W stosunku, on ma marginalne prawdopodobieństwa związane z anulowaniem i możesz przepisać iloraz szans w kategoriach prawdopodobieństw warunkowych : $p_{ij} = P(Y = i | X = j) \cdot P(X = j)$ $X$ $Y|X$

O R = \frac{P (Y = 1 | X = 1)}{P (Y = 0 | X = 1)} \cdot \frac{P (Y = 0 | X = 0)}{P (Y = 1 | X = 0)}

${\rm OR} = \frac{ P(Y = 1| X = 1) }{P(Y = 0 | X = 1)} \cdot \frac{ P(Y = 0 | X = 0) }{ P(Y = 1 | X = 0)}$

W regresji logistycznej modelujesz te prawdopodobieństwa bezpośrednio:

\log (\frac{P (Y_{i} = 1 | X_{i})}{P (Y_{i} = 0 | X_{i})}) = β_{0} + β_{1} X_{i}

$\log \left( \frac{ P(Y_i = 1|X_i) }{ P(Y_i = 0|X_i) } \right) = \beta_0 + \beta_1 X_i$

Możemy więc obliczyć te prawdopodobieństwa warunkowe bezpośrednio z modelu. Pierwszy stosunek powyższego wyrażenia dla to: ${\rm OR}$

\frac{P (Y_{i} = 1 | X_{i} = 1)}{P (Y_{i} = 0 | X_{i} = 1)} = \frac{(\frac{1}{1 + e^{- (β_{0} + β_{1})}})}{(\frac{e^{- (β_{0} + β_{1})}}{1 + e^{- (β_{0} + β_{1})}})} = \frac{1}{e^{- (β_{0} + β_{1})}} = e^{(β_{0} + β_{1})}

$\frac{ P(Y_i = 1| X_i = 1) }{P(Y_i = 0 | X_i = 1)} = \frac{ \left( \frac{1}{1 + e^{-(\beta_0+\beta_1)}} \right) } {\left( \frac{e^{-(\beta_0+\beta_1)}}{1 + e^{-(\beta_0+\beta_1)}}\right)} = \frac{1}{e^{-(\beta_0+\beta_1)}} = e^{(\beta_0+\beta_1)}$

a drugi to:

\frac{P (Y_{i} = 0 | X_{i} = 0)}{P (Y_{i} = 1 | X_{i} = 0)} = \frac{(\frac{e^{- β_{0}}}{1 + e^{- β_{0}}})}{(\frac{1}{1 + e^{- β_{0}}})} = e^{- β_{0}}

$\frac{ P(Y_i = 0| X_i = 0) }{P(Y_i = 1 | X_i = 0)} = \frac{ \left( \frac{e^{-\beta_0}}{1 + e^{-\beta_0}} \right) } { \left( \frac{1}{1 + e^{-\beta_0}} \right) } = e^{-\beta_0}$

${\rm OR} = e^{(\beta_0+\beta_1)} \cdot e^{-\beta_0} = e^{\beta_1}$

$Z_1, ..., Z_p$

\frac{P (Y = 1 | X = 1, Z_{1}, . . ., Z_{p})}{P (Y = 0 | X = 1, Z_{1}, . . ., Z_{p})} \cdot \frac{P (Y = 0 | X = 0, Z_{1}, . . ., Z_{p})}{P (Y = 1 | X = 0, Z_{1}, . . ., Z_{p})}

$\frac{ P(Y = 1| X = 1, Z_1, ..., Z_p) }{P(Y = 0 | X = 1, Z_1, ..., Z_p)} \cdot \frac{ P(Y = 0 | X = 0, Z_1, ..., Z_p) }{ P(Y = 1 | X = 0, Z_1, ..., Z_p)}$

so it is the odds ratio conditional on the values of the other predictors in the model and, in general, in not equal to

\frac{P (Y = 1 | X = 1)}{P (Y = 0 | X = 1)} \cdot \frac{P (Y = 0 | X = 0)}{P (Y = 1 | X = 0)}

$\frac{ P(Y = 1| X = 1) }{P(Y = 0 | X = 1)} \cdot \frac{ P(Y = 0 | X = 0) }{ P(Y = 1 | X = 0)}$

So, it is no surprise that you're observing a discrepancy between the exponentiated coefficient and the observed odds ratio.

Note 2: I derived a relationship between the true $\beta$ and the true odds ratio but note that the same relationship holds for the sample quantities since the fitted logistic regression with a single binary predictor will exactly reproduce the entries of a two-by-two table. That is, the fitted means exactly match the sample means, as with any GLM. So, all of the logic used above applies with the true values replaced by sample quantities.

Macro
źródło

Wow, thanks for taking the time to write out such a complete explanation.

mike

@Macro I found that "p-value being less than 0.05" and "95% CI does not include 1" are not consistent in logistic regression (I used SAS). Is this phenomenon related to your explanation?

user67275

You have a really nice answer from @Macro (+1), who has pointed out that the simple (marginal) odds ratio calculated without reference to a model and the odds ratio taken from a multiple logistic regression model ( $\exp(\beta)$ ) are in general not equal. I wonder if I can still contribute a little bit of related information here, in particular explaining when they will and will not be equal.

Beta values in logistic regression, like in OLS regression, specify the ceteris paribus change in the parameter governing the response distribution associated with a 1-unit change in the covariate. (For logistic regression, this is a change in the logit of the probability of 'success', whereas for OLS regression it is the mean, $\mu$ .) That is, it is the change all else being equal. Exponentiated betas are similarly ceteris paribus odds ratios. Thus, the first issue is to be sure that it is possible for this to be meaningful. Specifically, the covariate in question should not exist in other terms (e.g., in an interaction, or a polynomial term) elsewhere in the model. (Note that here I am referring to terms that are included in your model, but there are also problems if the true relationship varies across levels of another covariate but an interaction term was not included, for example.) Once we've established that it's meaningful to calculate an odds ratio by exponentiating a beta from a logistic regression model, we can ask the questions of when will the model-based and marginal odds ratios differ, and which should you prefer when they do?

The reason that these ORs will differ is because the other covariates included in your model are not orthogonal to the one in question. For example, you can check by running a simple correlation between your covariates (it doesn't matter what the p-values are, or if your covariates are $0/1$ instead of continuous, the point is simply that $r\ne0$ ). On the other hand, when all of your other covariates are orthogonal to the one in question, $\exp(\beta)$ will equal the marginal OR.

If the marginal OR and the model-based OR differ, you should use / interpret the model-based version. The reason is that the marginal OR does not account for the confounding amongst your covariates, whereas the model does. This phenomenon is related to Simpson's Paradox, which you may want to read about (SEP also has a good entry, there is a discussion on CV here: Basic-simpson's-paradox, and you can search on CV's simpsons-paradox tag). For the sake of simplicity and practicality, you may want to just only use the model based OR, since it will be either clearly preferable or the same.

gung - Reinstate Monica
źródło