Interesuje mnie następująca jednostronna wersja nierówności Czebyszewa Cantellego :
Zasadniczo, jeśli znasz średnią populacji i wariancję, możesz obliczyć górną granicę prawdopodobieństwa zaobserwowania określonej wartości. (Tak przynajmniej rozumiałem.)
Chciałbym jednak użyć średniej próby i wariancji próbki zamiast rzeczywistej średniej populacji i wariancji.
Domyślam się, że skoro wprowadziłoby to więcej niepewności, górna granica wzrósłaby.
Czy istnieje nierówność analogiczna do powyższej, ale która wykorzystuje średnią próbki i wariancję?
Edit: The "sample" analog of the Chebyshev Inequality (not one sided), has been worked out. The Wikipedia page has some details. However, I am not sure how it would translate to the one sided case I have above.
Odpowiedzi:
Yes, we can get an analogous result using the sample mean and variance, with perhaps, a couple slight surprises emerging in the process.
First, we need to refine the question statement just a little bit and set out a few assumptions. Importantly, it should be clear that we cannot hope to replace the population variance with the sample variance on the right hand side since the latter is random! So, we refocus our attention on the equivalent inequality
Second, we assume that we have a random sampleX1,…,Xn and we are interested in an upper bound for the analogous quantity
P(X1−X¯≥tS) ,
where X¯ is the sample mean and S is the sample standard deviation.
A half-step forward
Note that already by applying the original one-sided Chebyshev inequality toX1−X¯ , we get that
A sample version of one-sided Chebyshev
Note: We do not assume that theXi have either finite mean or variance!
Proof. The idea is to adapt the proof of the original one-sided Chebyshev inequality and employ symmetry in the process. First, setYi=Xi−X¯ for notational convenience. Then, observe that
Now, for anyc>0 , on {S>0} ,
Then,
The right-hand side is a constant (!), so taking expectations on both sides yields,
That pesky technical condition
Note that we had to assumeP(S=0)=0 in order to be able to divide by S2 in the analysis. This is no problem for absolutely continuous distributions, but poses an inconvenience for discrete ones. For a discrete distribution, there is some probability that all observations are equal, in which case 0=Yi=tS=0 for all i and t>0 .
We can wiggle our way out by settingq=P(S=0) . Then, a careful accounting of the argument shows that everything goes through virtually unchanged and we get
Proof. Split on the events{S>0} and {S=0} . The previous proof goes through for {S>0} and the case {S=0} is trivial.
A slightly cleaner inequality results if we replace the nonstrict inequality in the probability statement with a strict version.
Final remark: The sample version of the inequality required no assumptions onX (other than that it not be almost-surely constant in the nonstrict inequality case, which the original version also tacitly assumes), in essence, because the sample mean and sample variance always exist whether or not their population analogs do.
źródło
This is just a complement to @cardinal 's ingenious answer. Samuelson Inequality, states that, for a sample of sizen , when we have at least three distinct values of the realized xi 's, it holds that
Then, using the notation of Cardinal's answer we can state that
Since we require, three distinct values, we will haveS≠0 by assumption. So setting t=n−1−−−−−√ in Cardinal's Inequality (the initial version) we obtain
Eq.[2] is of course compatible with eq. [1] . The combination of the two tells us that Cardinal's Inequality is useful as a probabilistic statement for 0<t<n−1−−−−−√ .
If Cardinal's Inequality requiresS to be calculated bias-corrected (call this S~ ) then the equations become
and we chooset=n−1n√ to obtain through Cardinal's Inequality
źródło