Najpierw notacja. Niech , a { T T } 1 , ... , n oznacza kategoryczne sekwencję związanego z X, m i Y, n , tj Pr { X t = i } = o I , Pr { Y t = I } = b i . Niech N = n + m{Xt}1,…,m{Yt}1,…,nXmYnPr{Xt=i}=ai,Pr{Yt=i}=biN=n+m. Rozważ binerizacje
gdzieδi,j≡1i=jto Kronecker Delta. Mamy więcXm,i= N ∑ t =
X∗iY∗i=(X∗1,i,…,X∗N,i)=(δi,X1,…,δi,Xn,0,…,0)=(Y∗1,i,…,Y∗N,i)=(0,…,0,δi,Y1,…,δi,Yn)
δi,j≡1i=jXm,i=∑t=1NX∗t,i=∑t=1mδi,XtYn,i=∑t=1NY∗t,i=∑t=1nδi,Yt
Xm,i−mc^iYn,i−nc^i=(n+m)Xm,i−m(Xm,i+Yn,i)n+m=nXm,i−mYn,in+m=(n+m)Yn,i−n(Xm,i+Yn,i)n+m=mYn,i−nXm,in+m
So we can write the test statistic as
S=∑i=1k(Xm,i−mc^i)2mc^i+∑i=1k(Yn,i−nc^i)2nc^i=∑i=1k(nXm,i−mYn,i)2(n+m)2mc^i+∑i=1k(nXm,i−mYn,i)2(n+m)2nc^i=∑i=1k(nXm,i−mYn,i)2nm(n+m)c^i
Next note that
nXm,i−mYn,i=∑t=1NnX∗t,i−mY∗t,i=Zi
with the following properties
E[Zi]Var[Zi]Cov[Zi,Zj]=nE[Xm,i]−mE[Yn,i]=nmai−nmai=0=Var[nXm,i−mYn,i]=n2Var[Xm,i]−m2Var[Yn,i]Note Xm,i and Yn,i are independent=n2mai(1−ai)+m2nai(1−ai)=nm(n+m)ai(1−ai)=E[ZiZj]−E[Zi]E[Zj]=E[(nXm,i−mYn,i)(nXm,j−mYn,j)]=n2(−maiaj+m2aiaj)−2n2m2aiaj+m2(−naiaj+n2aiaj)=−nm(n+m)aiaj
and so by multivariate CLT we have
1nm(n+m)−−−−−−−−−√Z=nXm−mYnnm(n+m)−−−−−−−−−√→DN(0,Σ)
where the
(i,j)th element of
Σ,
σij=ai(δij−aj). Since
c^=(c^1,…,c^k)→p(a1,…,ak)=a By Slutsky we have
nXm−mYnnm(n+m)−−−−−−−−−√c^→DN(0,Ik−a−−√a−−√′)
where
Ik is the
k×k identity matrix,
a−−√=(a1−−√,…,ak−−√). Since
Ik−a−−√a−−√′ has eigenvalue 0 of multiplicty 1 and eigenvalue 1 of multiplicity
k−1, by the continuous mapping theorem (or see Lemma 17.1, Theorem 17.2 of van der Vaart) we have
∑i=1k(nXm,i−mYn,i)2nm(n+m)c^i→Dχ2k−1