Podejrzewam, że szereg zaobserwowanych sekwencji to łańcuch Markowa ...
X=⎛⎝⎜⎜⎜⎜AB⋮BCA⋮CDA⋮ADC⋮DBA⋮AAD⋮BCA⋮E⎞⎠⎟⎟⎟⎟
Jak mogę jednak sprawdzić, czy rzeczywiście szanują bez Pamięci właściwość
P(Xi=xi|Xj=xj)?
A przynajmniej udowodnić, że mają one charakter Markowa? Zauważ, że są to sekwencje obserwowane empirycznie. jakieś pomysły?
EDYTOWAĆ
Dodajmy, że celem jest porównanie przewidywanego zestawu sekwencji z zaobserwowanych. Będziemy wdzięczni za komentarze na temat tego, jak najlepiej je porównać.
Macierz przejścia pierwszego rzędu Mij=xij∑mxik
gdzie m = stany A..E
M=⎛⎝⎜⎜⎜⎜⎜⎜0.18340.46970.18270.23780.24580.30770.11360.24040.18180.17880.07690.00760.22120.06290.11730.14790.25000.19230.33570.17880.28400.15910.16350.18180.2793⎞⎠⎟⎟⎟⎟⎟⎟
Wartości własne M
E=⎛⎝⎜⎜⎜⎜⎜⎜1.000000000−0.2283000000.1344000000.1136−0.0430i000000.1136+0.0430i⎞⎠⎟⎟⎟⎟⎟⎟
Wektory własne M
V=⎛⎝⎜⎜⎜⎜⎜⎜0.44720.44720.44720.44720.4472−0.58520.7838−0.2006−0.00100.0540−0.4219−0.42110.37250.70890.0589−0.2343−0.0421i−0.4479−0.2723i0.63230.2123−0.0908i0.2546+0.3881i−0.2343+0.0421i−0.4479+0.2723i0.63230.2123+0.0908i0.2546−0.3881i⎞⎠⎟⎟⎟⎟⎟⎟
Odpowiedzi:
Zastanawiam się, czy poniższe dane dawałyby ważnego Pearsona testtest χ 2 dla proporcji w następujący sposób.χ2
Jest to kuszące dla mnie do myślenia, że każdy , tak, że całkowita T ~ χ 2 12 . Nie jestem jednak do końca tego pewien i doceniłbym twoje przemyślenia na ten temat. Nie nie jestem również co sertain o tym, czy trzeba być paranoikiem o niezależności i chciałby podzielić próbkę w połówkach oszacować p i ° str .TU∼χ23 T∼χ212 p^ p¯
źródło
Markov property might be hard to test directly. But it might be enough to fit a model which assumes Markov property and then test whether the model holds. It may turn out that the fitted model is a good approximation which is useful for you in practice, and you need not to be concerned whether Markov property really holds or not.
The parallel can be drawn to the linear regression. The usual practice is not to test whether linearity holds, but whether linear model is a useful approximation.
źródło
To concretize the suggestion of the previous reply, you first want to estimate the Markov probabilities - assuming it's Markov. See the reply here Estimating Markov Chain Probabilities
You should get a 4 x 4 matrix based on the proportion of transitions from state A to A, A to B, etc. Call this matrixM . M2 should then be the two-step transition matrix: A to A in 2 steps, and so on. You can then test if your observed 2 step transition matrix is similar to M2 .
Since you have a lot of data for the number of states, you could estimateM from one half of the data and test M2 using the other half - you are testing observed frequencies against theoretical probabilities of a multinomial. That should give you an idea of how far off you are.
Another possibility would be to see if the basic state proportions: proportion time spent in A, time spent in B, matches the eigenvector of the unit eigenvalue of M. If your series has reached some sort of steady state, the proportion of time in each state should tend to that limit.
źródło
Beyond Markov Property (MP), a further property is Time Homogeneity (TH):Xt can be Markov but with its transition matrix
P(t) depending on time t . E.g., it may depend on
the weekday at t if observations are daily, and then a dependence
Xt on Xt−7 conditional on Xt−1 may be diagnosed if TH
is unduly assumed.
Assuming TH holds, a possible check for MP is testing thatXt is independent
from Xt−2 conditional on Xt−1 , as Michael Chernick and StasK
suggested. This can be done by using a test for contingency table.
We can build the n contingency tables of Xt and Xt−2
conditional on {Xt−1=xj} for the n possible values xj ,
and test for independence. This can also be done using Xt−ℓ
with ℓ>1 in place of Xt−2 .
In R, contingency tables or arrays are easily produced thanks to the factor facility and the functionsp(Xt|Xt−1=xj,Xt−2=xi) . For instance
setting i as row index and j as column index in trellis should under MP lead to similar
distributions within a column.
apply
,sweep
. The idea above can also be exploited graphically. Packages ggplot2 or lattice easily provide conditional plots to compare conditional distributionsThe chap. 5 of the book The statistical analysis of stochastic processes in time by J.K Lindsey contains other ideas for checking assumptions.
]
źródło
I think placida and mpiktas have both given very thoughtful and excellent approaches.
I am answering because I just want to add that one could construct a test to see ifP(Xi=x|Xi−1=y) is different from P(Xi=x|Xi−1=y and Xi−2=z) .
I would pick values forx , y and z for which there are a large number of cases where the transition from z to y to x occurs. Compute sample estimates for both probabilities. Then test for difference in proportions. The difficult aspect of this is to get the variances of the two estimates under the null hypothesis that say the proportions are equal and the chain is stationary and Markov. In that case under the null hypothesis if we just look at all 2 stage transitions and compare them to their corresponding three stage transitions but only include outcomes where these sets of paired outcomes are separate by at least 2 time points then the sequence of joint outcomes where success is defined as a z to y to x transition and all other two stage transitions to x as failures represent a set of independent Bernoulli trials under the null hypothesis. The same would work for defining all y to x transitions as successes and other one stage transitions to x as failures.
Then the test statistic would be the difference between these estimated proportions. The complication to the standard comparison of the Bernoulli sequences is that they are correlated. But you could do a bootstrap test of binomial proportions in this case.
The other possibility is to construct a two by two table of the two stage and three stage paired outcomes where0 is failure and 1 is success and the cell frequencies are counts for the pairs (0,0) , (0,1) , (1,0) and (1,1) where the first component is the two stage outcome and the second is the corresponding three stage outcome. You can then apply McNemar's test to the table.
źródło
You could bin the data into evenly spaced intervals, then compute the unbiased sample variances of subsets{Xn+1:Xn=x1,Xn−k=x2} . By the law of total variance,
The LHS, if it is almost zero, provides evidence that the transition probabilities do not depend onXn−k , though it is clearly a weaker statement: e.g., let Xn+1∼N(Xn,Xn−1) . Taking the expected value of both sides of the above equation, the RHS can be computed from the sample variances (i.e., replacing expected values with averages). If the expected value of the variance is zero then the variance is 0 almost always.
źródło