Jaki jest związek między rozkładem Beta a modelem regresji logistycznej?

16

Moje pytanie brzmi: jaki jest matematyczny związek między rozkładem Beta a współczynnikami modelu regresji logistycznej ?

Aby zilustrować: funkcję logistyczną (sigmoid) podano przez

f(x)=11+exp(x)

i służy do modelowania prawdopodobieństw w modelu regresji logistycznej. Niech A będzie wynikiem dychotomicznym (0,1) a X macierzą projektową. Model regresji logistycznej podano przez

P(A=1|X)=f(Xβ).

Uwaga ma pierwszą kolumnę o stałej 1 ( punkt przecięcia), a β jest wektorem kolumnowym współczynników regresji. Na przykład, gdy mamy jeden (normalny-normalny) regresor x i wybieramy β 0 = 1 ( punkt przecięcia) i β 1 = 1 , możemy symulować wynikowy „rozkład prawdopodobieństw”.X1βxβ0=1β1=1

Histogram P (A = 1 | X)

Wykres ten przypomina rozkład Beta (podobnie jak wykresy dla innych wyborów ), których gęstość jest podana przezβ

g(y;p,q)=Γ(p)Γ(q)Γ(p+q)y(p1)(1y)(q1).

Przy użyciu maksymalnego prawdopodobieństwa lub metod momentów można oszacować i q na podstawie rozkładu P ( A = 1 | X ) . Zatem moje pytanie sprowadza się do: jaki jest związek między wyborami β i p i q ? To, na początek, odnosi się do dwuwymiarowego przypadku podanego powyżej.pqP(A=1|X)βpq

tomka
źródło
Właśnie zastanawiałem się 3 godziny temu na mojej lekcji statystyki bayesowskiej
Alchemist

Odpowiedzi:

16

(0,1)(0,1) you can easily find parameters of such beta distribution that "resembles" shape of the distribution.

Notice that logistic regression provides you with conditional probabilities Pr(Y=1X), while on your plot you are presenting us the marginal distribution of predicted probabilities. Those are two different things to talk about.

There is no direct relation between logistic regression parameters and parameters of beta distribution when looking on the distribution of predictions from logistic regression model. Below you can see data simulated using normal, exponential and uniform distributions transformed using logistic function. Besides using exactly the same parameters of logistic regression (i.e. β0=0,β1=1), the distributions of predicted probabilities are very different. So distribution of predicted probabilities depends not only on parameters of logistic regression, but also on distributions of X's and there is no simple relation between them.

Logistic function of data simulated under normal, exponential and uniform distributions

Since beta is a distribution of values in (0,1), then it cannot be used to model binary data as logistic regression does. It can be used to model probabilities, in such way we use beta regression (see also here and here). So if you are interested as the probabilities (understood as random variable) behave, you can use beta regression for such purpose.

Tim
źródło
So if Beta can approximate any such distribution, shouldn't there be a relationship between its parameters and β?
tomka
4
@tomka but the distribution depends on distribution of your data and on the parameters, so even is such relationship exists it's a very complicated one. There is obviously no direct relationship between regression parameters and parameters of beta distribution. Try simulating logistic regression predictions under the same parameters using different distributions for X, the marginal distribution will differ in each case.
Tim
4
The beta distribution is not that flexible -- it cannot approximate multimodal distributions.
Marcus P S
@MarcusPS I made it more clear.
Tim
1
@MarcusPS except the special case of multimodal distributions with modes at 0 and 1 ...
Ben Bolker
4

Logistic regression is a special case of a Generalized Linear Model (GLM). In this particular case of binary data, the logistic function is the canonical link function that transforms the non-linear regression problem at hand into a linear problem. GLMs are somewhat special, in the sense that they apply only to distributions in the exponential family (such as the Binomial distribution).

In Bayesian estimation, the Beta distribution is the conjugate prior to the binomial distribution, which means that a Bayesian update to a Beta prior, with binomial observations, will result in a Beta posterior. So if you have counts for observations of binary data, you can get an analytical Bayesian estimate of the parameters of the binomial distribution by using a Beta prior.

So, along the lines of what has been said by other, I don't think there is a direct relation, but both the Beta distribution and logistic regression have close relationships with estimating the parameters of something that follows a binomial distribution.

Marcus P S
źródło
1
I already +1'd for mentioning Bayesian perspective, but notice that in case of regression model we do not use beta-binomial model and beta distribution in general is not used as a prior for parameters -- at least in case of typical Bayesian logistic regression. So this does not directly translate to beta-binomial model.
Tim
3

Maybe there is no direct connection? The distribution of P(A=1|X) largely depends on your simulation of X. If you simulated X with N(0,1), exp(Xβ) will have log-normal distribution with μ=1 given β0=β1=1. The distribution of P(A=1|X) can then be found explicitly: with c.d.f.

F(x)=1Φ[ln(1x1)+1],
inverse c.d.f.
Q(x)=11+exp(Φ1(1x)1),
and p.d.f.
f(x)=1x(1x)2πexp((ln(1/x1)+1)22),
which do not resemble those of Beta distribution.

You can verify the results given above in R:

n = 100000

X = cbind(rep(1, n), rnorm(n)) # simulate design matrix
Y = 1 / (exp(-X %*% c(1,1)) + 1) # P(A=1|X)

Z1 = 1 / (rlnorm(n, -1, 1) + 1) # simulate from lognormal directly
Z2 = 1 / (1 + exp(qnorm(runif(n)) - 1)) # simulate with inverse CDF

# Kolmogorov–Smirnov test
ks.test(Y, Z1)
ks.test(Y, Z2)

# plot fitted density
new.pdf = function(x) {
  1 / (x * (1 - x) * sqrt(2 * pi)) * exp(-0.5 * (log(1 / x - 1) + 1)^2)
}
hist(Y, breaks = "FD", probability = T)
curve(new.pdf, col = 4, add = T)

enter image description here

Francis
źródło
My x is indeed standard-normal (I made an edit). Your density f(x) has support over [inf,inf], whereas the density of P(A|X) should have support only on [0,1]. In fact your f(x) should be the standard normal. In other words you have not yet shown the distribution of P(A|X).
tomka
@tomka Logarithm put 1/x1>0, so x(0,1). Also f is not pdf of standard normal, note the denominator.
Francis
Why would the CLT have any applicability to the distribution of a regressor variable X??
whuber
@whuber: looks like I have mistaken something, I removed that part.
Francis