Wystarczające i niezbędne warunki dla zerowej wartości własnej macierzy korelacji

11

Biorąc pod uwagę $n$ Zmienna losowa $X_i$ z rozkładu prawdopodobieństwa $P(X_1,\ldots,X_n)$ , w korelacji macierzy $C_{ij}=E[X_i X_j]-E[X_i]E[X_j]$ jest dodatnia pół- określony, tj. jego wartości własne są dodatnie lub zerowe.

Interesują mnie warunki na $P$ które są konieczne i / lub wystarczające, aby $C$ miał $m$ zero wartości własnych. Na przykład wystarczającym warunkiem jest to, że zmienne losowe nie są niezależne: $\sum_i u_i X_i=0$ dla niektórych liczb rzeczywistych $u_i$ . Na przykład, jeśli $P(X_1,\ldots,X_n)=\delta(X_1-X_2)p(X_2,\ldots,X_n)$ , a następnie $\vec u=(1,-1,0,\ldots,0)$ jest wektorem własnym $C$ o zerowej wartości własnej. Jeżeli mamy $m$ niezależne ograniczenia liniowe na $X_i$ tego typu, oznaczałoby to $m$ zero wartości własnych.

Istnieje co najmniej jedna dodatkowa (ale trywialna) możliwość, gdy dla niektórych (tj. $X_a=E[X_a]$ $a$ ), ponieważ w tym przypadku ma kolumnę i wiersz zer: $P(X_1,\ldots,X_n)\propto\delta(X_a-E[X_a])$ $C_{ij}$ . Ponieważ nie jest to tak naprawdę interesujące, zakładam, że rozkład prawdopodobieństwa nie ma takiej postaci. $C_{ia}=C_{ai}=0,\,\forall i$

Moje pytanie brzmi: czy ograniczenia liniowe to jedyny sposób na wywołanie zerowych wartości własnych (jeśli zabronimy trywialnego wyjątku podanego powyżej), czy też nieliniowe ograniczenia zmiennych losowych mogą również generować zerowe wartości własne ? $C$

correlation Adam
źródło

1

Z definicji zbiór wektorów zawierający wektor zerowy jest liniowo zależny, więc twoja dodatkowa możliwość nie jest niczym nowym ani innym. Czy mógłbyś wyjaśnić, co masz na myśli przez „mający

wartość własną”? To wygląda na błąd typograficzny.

m

$m$

whuber

@ whuber: tak, literówka. Poprawione Myślę, że dwa warunki są różne: jeden dotyczy zależności między zmiennymi, a drugi prawdopodobieństwa tylko zmiennej (mianowicie

).

p (X_{a}) = δ (X_{a} - E (X_{a}))

$p(X_a)=\delta(X_a-E(X_a))$

Adam

Sformułowanie twojego pytania jest mylące. To wygląda jak elementarne twierdzenia algebry liniowej, ale odniesienia do „niezależnych” zmiennych losowych sugerować to może być o czymś zupełnie innym. Czy poprawne byłoby zrozumienie, że za każdym razem, gdy używasz „niezależnego”, masz na myśli w sensie liniowej niezależności, a nie w sensie (statystycznie) niezależnych zmiennych losowych? Twoje odniesienie do „brakujących danych” jest jeszcze bardziej mylące, ponieważ sugeruje, że „zmienne losowe” mogą naprawdę oznaczać tylko kolumny macierzy danych. Dobrze byłoby zobaczyć te znaczenia wyjaśnione.

whuber

@whuber: Zredagowałem pytanie. Mam nadzieję, że jest to wyraźniejsze.

Adam

Warunkiem niezależności

niekoniecznie muszą być zerowy (dowolny stały zrobi), chyba że średni każdego

wynosi zero.

\sum_{i} u_{i} X_{i} = 0

$\sum_i u_i X_i=0$

X_{i}

$X_i$

Sextus Empiricus

6

Być może poprzez uproszczenie zapisu możemy wydobyć podstawowe idee. Okazuje się, że nie potrzebujemy angażować oczekiwań ani skomplikowanych formuł, ponieważ wszystko jest czysto algebraiczne.

Algebraiczna natura obiektów matematycznych

Pytanie dotyczy relacji między (1) macierzą kowariancji skończonego zbioru zmiennych losowych oraz (2) relacjami liniowymi między tymi zmiennymi, uważanymi za wektory . $X_1, \ldots, X_n$

Przestrzeń wektorową o którym mowa, jest to zbiór wszystkich skończonych wariancji zmiennej losowej (w danym miejscu prawdopodobieństwa ) modulo podprzestrzeni zmiennych prawie na pewno stałych, oznaczono (To znaczy, uważamy dwie losowe zmienne i za ten sam wektor, gdy istnieje zerowa szansa, że różni się od jego oczekiwań.) Mamy do czynienia tylko z przestrzenną przestrzenią wektorową generowaną przez $(\Omega,\mathbb P)$ $\mathcal{L}^2(\Omega,\mathbb P)/\mathbb R.$ $X$ $Y$ $X-Y$ $V$ $X_i,$ co sprawia, że jest to problem algebraiczny, a nie analityczny.

Co musimy wiedzieć o wariancjach

jest czymś więcej niż przestrzenią wektorową: jestmodułem kwadratowym,ponieważ jest wyposażony w wariancję. Wszystko, co musimy wiedzieć o wariancjach, to dwie rzeczy: $V$

Wariancja jest skalar wartościach funkcji z własności, że dla wszystkich wektorów $Q$ $Q(aX)=a^2Q(X)$ $X.$
Wariancja nie jest generowana.

Drugi wymaga wyjaśnienia. określa „iloczyn punktowy”, który jest symetryczną dwuliniową formą podaną przez $Q$

X \cdot Y = \frac{1}{4} (Q (X + Y) - Q (X - Y)) .

$X\cdot Y = \frac{1}{4}\left(Q(X+Y) - Q(X-Y)\right).$

(Jest to oczywiście nic innego niż kowariancji zmiennych i ) Wektorów i są prostopadłe , gdy ich iloczyn skalarny wynosi ortogonalne dopełnienie dowolnego zbioru wektorów składa się z wszystkich wektorów ortogonalnych do każdego elementu z napisane $X$ $Y.$ $X$ $Y$ $0.$ $\mathcal A \subset V$ $\mathcal A,$

A^{0} = {v \in V ∣ a . v = 0 for all v \in V} .

$\mathcal{A}^0 = \{v\in V\mid a . v = 0\text{ for all }v \in V\}.$

Jest to wyraźnie przestrzeń wektorowa. Gdy , jest generowany. $V^0 = \{0\}$ $Q$

Pozwólcie mi udowodnić, że wariancja rzeczywiście nie jest generowana, nawet jeśli może wydawać się oczywista. Załóżmy, że jest niezerowym elementem Oznacza to, że dla wszystkich równoważnie $X$ $V^0.$ $X\cdot Y = 0$ $Y\in V;$

Q (X + Y) = Q (X - Y)

$Q(X+Y) = Q(X-Y)$

dla wszystkich wektorów Biorąc $Y.$ daje $Y=X$

4 Q (X) = Q (2 X) = Q (X + X) = Q (X - X) = Q (0) = 0

$4 Q(X) = Q(2X) = Q(X+X) = Q(X-X) = Q(0) = 0$

a zatem Wiemy jednak (być może przy użyciu nierówności Czebyszewa), że jedyne zmienne losowe o zerowej wariancji są prawie na pewno stałe, co identyfikuje je z wektorem zerowym w QED. $Q(X)=0.$ $V,$

Interpretacja pytań

Wracając do pytań, w poprzednim zapisie macierz kowariancji zmiennych losowych jest po prostu regularną tablicą wszystkich ich produktów kropkowych,

T = (X_{i} \cdot X_{j}) .

$T = (X_i\cdot X_j).$

There is a good way to think about $T$ : it defines a linear transformation on $\mathbb{R}^n$ in the usual way, by sending any vector $x=(x_1, \ldots, x_n)\in\mathbb{R}^n$ into the vector $T(x)=y=(y_1, \ldots, x_n)$ whose $i^\text{th}$ component is given by the matrix multiplication rule

y_{i} = \sum_{j = 1}^{n} (X_{i} \cdot X_{j}) x_{j} .

$y_i = \sum_{j=1}^n (X_i\cdot X_j)x_j.$

The kernel of this linear transformation is the subspace it sends to zero:

Ker (T) = {x \in R^{n} ∣ T (x) = 0} .

$\operatorname{Ker}(T) = \{x\in \mathbb{R}^n\mid T(x)=0\}.$

The foregoing equation implies that when $x\in \operatorname{Ker}(T),$ for every $i$

0 = y_{i} = \sum_{j = 1}^{n} (X_{i} \cdot X_{j}) x_{j} = X_{i} \cdot (\sum_{j} x_{j} X_{j}) .

$0 = y_i = \sum_{j=1}^n (X_i\cdot X_j)x_j = X_i \cdot \left(\sum_j x_j X_j\right).$

Since this is true for every $i,$ it holds for all vectors spanned by the $X_i$ : namely, $V$ itself. Consequently, when $x\in\operatorname{Ker}(T),$ the vector given by $\sum_j x_j X_j$ lies in $V^0.$ Because the variance is nondegenerate, this means $\sum_j x_j X_j = 0.$ That is, $x$ describes a linear dependency among the $n$ original random variables.

You can readily check that this chain of reasoning is reversible:

Linear dependencies among the $X_j$ as vectors are in one-to-one correspondence with elements of the kernel of $T.$

(Remember, this statement still considers the $X_j$ as defined up to a constant shift in location--that is, as elements of $\mathcal{L}^2(\Omega,\mathbb P)/\mathbb R$ --rather than as just random variables.)

Finally, by definition, an eigenvalue of $T$ is any scalar $\lambda$ for which there exists a nonzero vector $x$ with $T(x) = \lambda x.$ When $\lambda=0$ is an eigenvalue, the space of associated eigenvectors is (obviously) the kernel of $T.$

Summary

We have arrived at the answer to the questions: the set of linear dependencies of the random variables, qua elements of $\mathcal{L}^2(\Omega,\mathbb P)/\mathbb R,$ corresponds one-to-one with the kernel of their covariance matrix $T.$ This is so because the variance is a nondegenerate quadratic form. The kernel also is the eigenspace associated with the zero eigenvalue (or just the zero subspace when there is no zero eigenvalue).

Reference

I have largely adopted the notation and some of the language of Chapter IV in

Jean-Pierre Serre, A Course In Arithmetic. Springer-Verlag 1973.

whuber
źródło

X_{j}

$X_j$

\vec{X} = (X_{1}, \dots, X_{n})

$\vec X=(X_1,\ldots,X_n)$ ), or do you ? If I'm right, I'm guessing that you are collecting the possible values of the random variable

X_{i}

$X_i$ into a vector, while the probability distribution is hidden into the definition of the variance, right ?

Adam

I think the main aspect that is not quite clear is the following (which might just show my lack of formal knowledge of probability theory) : you seem to show that if there is a 0 eigenvalue, then we have e.g.

X_{1} = X_{2}

$X_1=X_2$ . This constraint does not refer to the probability distribution

P

$P$ , which is hidden in

Q

$Q$ (I think this is the clever point about this demonstration). But what does that mean to have

X_{1} = X_{2}

$X_1=X_2$ without reference to

P

$P$ ? Or does it just imply that

P \propto δ (X_{1} - X_{2})

$P\propto \delta(X_1-X_2)$ , but then how do we know that it must be a linear combination of $X_1$ and $X_2$ in the delta function?

Adam

I'm afraid I don't understand your use of a "delta function" in this context, Adam. That is partly because I see no need for it and partly because the notation is ambiguous: would that be a Kronecker delta or a Dirac delta, for instance?

whuber

It would be a Kronecker or a Dirac depending on the variables (discrete or continuous). These delta's could be part of the integration measure, e.g. I integrate over 2-by-2 matrices

M

$M$ (so four real variables

X_{1}

$X_1$ ,

X_{2}

$X_2$ ,

X_{3}

$X_3$ and

X_{4}

$X_4$ , with some weight (say

P = \exp (- t r (M . M^{T}))

$P=\exp(-tr(M.M^T))$ ), or I integrate over a sub-group. If it is symmetric matrices (implying for instance

X_{2} = X_{3}

$X_2=X_3$ ), I can formally impose that by multiplying

P

$P$ by

δ (X_{1} - X_{2})

$\delta(X_1-X_2)$ . This would be a linear constraint. An example of non-linear constraint is given in the comments below Martijn Weterings's answer.

Adam

(continued) The question is : what can of non-linear constraints that I can add on my variables can induce a 0 eigenvalue. By your answers, it seems to be : only non-linear constraint that imply linear constraint (as exemplified in the comments below Martijn Weterings's answer). Maybe the problem is that my way of thinking of the problem is from a physicist point of view, and I struggle to explain it in a different language (I think here is the right place to ask this question, no physics.SE).

Adam

5

Linear independence is not just sufficient but also a neccesary condition

To show that the variance-covariance matrix has eigenvalues equal to zero if and only if the variables are not linearly independent, it only remains to be shown that "if the matrix has eigenvalues equal to zero then the variables are not linearly independent".

If you have a zero eigenvalue for $C_{ij} = \text{Cov}(X_i,X_j)$ then there is some linear combination (defined by the eigenvector $v$ )

Y = \sum_{i = 1}^{n} v_{i} (X_{i})

$Y = \sum_{i=1}^n v_i (X_i)$

such that

\begin{array}{rcl} Cov (Y, Y) & = & \sum_{i = 1}^{n} \sum_{j = 1}^{n} v_{i} v_{j} Cov (X_{i}, X_{j}) \\ = & \sum_{i = 1}^{n} v_{i} \sum_{j = 1}^{n} v_{j} C_{i j} \\ = & \sum_{i = 1}^{n} v_{i} \cdot 0 \\ = & 0 \end{array}

$\begin{array}{rcl} \text{Cov}(Y,Y) &=& \sum_{i=1}^n \sum_{j=1}^n v_i v_j \text{Cov}(X_i,X_j) \\ &=&\sum_{i=1}^n v_i\sum_{j=1}^n v_j C_{ij} \\ &= &\sum_{i=1}^n v_i \cdot 0 \\ &=& 0 \end{array}$

which means that $Y$ needs to be a constant and thus the variables $X_i$ have to add up to a constant and are either constants themselves (the trivial case) or not linearly independent.

^{- the first line in the equation with $\text{Cov}(Y,Y)$ is due to the property of covariance}

Cov (a U + b V, c W + d X) = a c Cov (U, W) + b c Cov (V, W) + a d Cov (U, X) + b d Cov (V, X)

$\scriptsize\text{Cov}(aU+bV,cW+dX) = ac\,\text{Cov}(U,W) + bc\,\text{Cov}(V,W) +ad\, \text{Cov}(U,X) + bd \,\text{Cov}(V,X)$

^{- the step from the second to the third line is due to the property of a zero eigenvalue}

\sum_{j = 1}^{n} v_{j} C_{i j} = 0

$\scriptsize \sum_{j=1}^nv_jC_{ij} = 0$

Non-linear constraints

So, since linear constraints are a necessary condition (not just sufficient), non-linear constraints will only be relevant when they indirectly imply a (necessary) linear constraint.

In fact, there is a direct correspondence between the eigenvectors associated with the zero eigenvalue and the linear constraints.

C \cdot v = 0 ⟺ Y = \sum_{i = 1}^{n} v_{i} X_{i} = const

$C \cdot v = 0 \iff Y = \sum_{i=1}^n v_i X_i = \text{const}$

Thus non-linear constraints leading to a zero eigenvalue must, together combined, generate some linear constraint.

How can non-linear constraints lead to linear constraints

Your example in the comments can show this intuitively how non-linear constraints can lead to linear constraints by reversing the derivation. The following non-linear constraints

\begin{array}{lcr} a^{2} + b^{2} & = & 1 \\ c^{2} + d^{2} & = & 1 \\ a c + b d & = & 0 \\ a d - b c & = & 1 \end{array}

$\begin{array}{lcr} a^2+b^2&=&1\\ c^2+d^2&=&1\\ ac + bd &=& 0 \\ ad - bc &=& 1 \end{array}$

can be reduced to

\begin{array}{lcr} a^{2} + b^{2} & = & 1 \\ c^{2} + d^{2} & = & 1 \\ a - d & = & 0 \\ b + c & = & 0 \end{array}

$\begin{array}{lcr} a^2+b^2&=&1\\ c^2+d^2&=&1\\ a-d&=&0 \\ b+c &=& 0 \end{array}$

You could inverse this. Say you have non-linear plus linear constraints, then it is not strange to imagine how we can replace one of the linear constraints with a non-linear constraint, by filling the linear constraints into the non-linear constraints. E.g when we substitute $a=d$ and $b=-c$ in the non-linear form $a^2+b^2=1$ then you can make another relationship $ad-bc=1$ . And when you multiply $a=d$ and $c=-b$ then you get $ac=-bd$ .

Sextus Empiricus
źródło

I guess this (and the answer by whuber) is an indirect answer to my question (which was : "is linear dependence the only way to obtain a zero eigenvalue") in this way : even if the dependence between the random variables is non-linear, it can always be rewritten as a linear dependence by just writing

Y = \sum_{i} ν_{i} X_{i}

$Y=\sum_i \nu_i X_i$ . Although I was really looking for way to characterize the possible non-linear constraints themselves, I guess it is nevertheless a useful result.

Adam

Yes, I know... what I'm saying is that if there is a non-linear dependence and there is a zero eigenvalue, then by your answer, it means that the non-linear dependence can be "factored" in some way into a linear dependence. It is a weaker version of what I was looking for, but still something.

Adam

Your a giving an example that does not work, which does not mean that it cannot be the case...

Adam

Here is a counter-example of what your saying (if you think it is not, then it might help us find what is wrong with my formulation of the problem :) ) : Take a 2-by-2 random matrix

M

$M$ , with the non-linear constraint

M . M^{T} = 1

$M.M^T=1$ and

det M = 1

$\det M=1$ . These 3 non-linear constraint can be rewritten in terms of 2 linear constraints, and one linear : meaning that the covariance matrix has two 0 eigenvector. Remove the constraint

det M = 1

$\det M=1$ , and they disappear.

Adam

M_{11} = X_{1}

$M_{11}=X_1$ ,

M_{12} = X_{2}

$M_{12}=X_2$ ,

M_{21} = X_{3}

$M_{21}=X_3$ and

M_{22} = X_{4}

$M_{22}=X_4$ . The constraints are

X_{1}^{2} + X_{2}^{2} = 1

$X_1^2+X_2^2=1$ ,

X_{3}^{2} + X_{4}^{2} = 1

$X_3^2+X_4^2=1$ ,

X_{1} X_{3} + X_{2} X_{4} = 0

$X_1 X_3+X_2 X_4=0$ (only two are independent). They do not imply a zero eigenvalue. However, adding

X_{1} X_{4} - X_{2} X_{3} = 1

$X_1 X_4-X_2 X_3=1$ does imply two eigenvectors with 0 eigenvalues.

Adam

2

Suppose $C$ has an eigenvector $v$ with corresponding eigenvalue $0$ , then $\operatorname{var}(v^T X) = v^T Cv = 0$ . Thus, by Chebyshev's inequality, $v^TX$ is almost surely constant and equal to $v^T E [X]$ . That is, every zero eigenvalue corresponds to a linear restriction, namely $v^T X = v^T E[X]$ . There is no need to consider any special cases.

Thus, we conclude:

"are linear constraints the only way to induce zero eigenvalues [?]"

Yes.

"can non-linear constraints on the random variables also generate zero eigenvalues of C ?"

Yes, if they imply linear constraints.

ekvall
źródło

I agree. I was hoping that one could be more specific on the kind of non-linear constraints, but I guess that it is hard to do better if we do not specify the constraints.

Adam

2

The covariance marix $C$ of $X$ is symmetric so you can diagnonalize it as $C=Q\Lambda Q^T$ , with the eigenvalues in the diagonal matrix $\Lambda.$ Rewriting this as $\Lambda=Q^TCQ$ , the rhs is the covariance matrix of $Q^TX$ , so zero eigenvalues on the lhs correspond to linear combinations of $X$ with degenerate distributions.

Hasse1987
źródło

This is a very nice concise description, but how could we make it more intuitive that

Q^{T} C Q = cov (Q^{T} X)

$Q^TCQ = \text{cov}(Q^TX)$ ?

Sextus Empiricus

Wystarczające i niezbędne warunki dla zerowej wartości własnej macierzy korelacji

Odpowiedzi:

Algebraiczna natura obiektów matematycznych

Co musimy wiedzieć o wariancjach

Interpretacja pytań

Summary

Reference

Linear independence is not just sufficient but also a neccesary condition

Non-linear constraints

How can non-linear constraints lead to linear constraints