Jak zapewnić właściwości macierzy kowariancji przy dopasowywaniu wielowymiarowego modelu normalnego przy maksymalnym prawdopodobieństwie?

22

Załóżmy, że mam następujący model

yi=f(xi,θ)+εi

where yiRK , xi is a vector of explanatory variables, θ is the parameters of non-linear function f and εiN(0,Σ), where Σ naturally is K×K matrix.

The goal is the usual to estimate θ and Σ. The obvious choice is maximum likelihood method. Log-likelihood for this model (assuming we have a sample (yi,xi),i=1,...,n) looks like

l(θ,Σ)=n2log(2π)n2logdetΣi=1n(yif(xi,θ))Σ1(yf(xi,θ)))

Now this seems simple, the log-likelihood is specified, put in data, and use some algorithm for non-linear optimisation. The problem is how to ensure that Σ is positive definite. Using for example optim in R (or any other non-linear optimisation algorithm) will not guarantee me that Σ is positive definite.

So the question is how to ensure that Σ stays positive definite? I see two possible solutions:

  1. Reparametrise Σ as RR where R is upper-triangular or symmetric matrix. Then Σ will always be positive-definite and R can be unconstrained.

  2. Use profile likelihood. Derive the formulas for θ^(Σ) and Σ^(θ). Start with some θ0 and iterate Σ^j=Σ^(θ^j1), θ^j=θ^(Σ^j1) until convergence.

Is there some other way and what about these 2 approaches, will they work, are they standard? This seems pretty standard problem, but quick search did not give me any pointers. I know that Bayesian estimation would be also possible, but for the moment I would not want to engage in it.

mpiktas
źródło
I have the same issue in a Kalman algorithm, but the problem is much more complicated and not as easy to use the Hamilton trick. I wonder then whether a simpler thing to do would to simply use log(detΣ+1). This way I force the code to not give an error and do not change the solution. This also has the benefit of forcing this term to have the same sign as the final part of the likelihood. Any ideas?
econ_pipo

Odpowiedzi:

6

Assuming that in constructing the covariance matrix, you are automatically taking care of the symmetry issue, your log-likelihood will be when Σ is not positive definite because of the logdet Σ term in the model right? To prevent a numerical error if det Σ<0 I would precalculate det Σ and, if it is not positive, then make the log likelihood equal -Inf, otherwise continue. You have to calculate the determinant anyways, so this is not costing you any extra calculation.

Macro
źródło
5

As it turns out you can use profile maximum likelihood to ensure the necessary properties. You can prove that for given θ^, l(θ^,Σ) is maximised by

Σ^=1ni=1nε^iε^i,

where

ε^i=yif(xi,θ^)

Then it is possible to show that

i=1n(yif(xi,θ^))Σ^1(yf(xi,θ^)))=const,

hence we only need to maximise

lR(θ,Σ)=n2logdetΣ^.

Naturally in this case Σ will satisfy all the necessary properties. The proofs are identical for the case when f is linear which can be found in Time Series Analysis by J. D. Hamilton page 295, hence I omitted them.

mpiktas
źródło
3

An alternative parameterization for the covariance matrix is in terms of eigenvalues λ1,...,λp and p(p1)/2 "Givens" angles θij.

That is, we can write

Σ=GTΛG

where G is orthonormal, and

Λ=diag(λ1,...,λp)

with λ1...λp0.

Meanwhile, G can be parameterized uniquely in terms of p(p1)/2 angles, θij, where i=1,2,...,p1 and j=i,...,p1.[1]

(details to be added)

[1]: Hoffman, Raffenetti, Ruedenberg. "Generalization of Euler Angles to N‐Dimensional Orthogonal Matrices". J. Math. Phys. 13, 528 (1972)

charles.y.zheng
źródło
The matrix G is actually orthogonal, because Σ is a symmetric matrix. This is the approach I was going to recommend - Basically amounts to rotating the yi vector and the model function f(xi,θ) so that the errors are independent, then applying OLS to each of the rotated components (I think).
probabilityislogic
2

Along the lines of charles.y.zheng's solution, you may wish to model Σ=Λ+CC, where Λ is a diagonal matrix, and C is a Cholesky factorization of a rank update to Λ. You only then need to keep the diagonal of Λ positive to keep Σ positive definite. That is, you should estimate the diagonal of Λ and the elements of C instead of estimating Σ.

shabbychef
źródło
Can below diagonal elements in this settings be anything I want as long as the diagonal is positive? When simulate matrices this way in numpy not all of them are positive definite.
sztal
Λ is a diagonal matrix.
shabbychef