Czy istnieje odwrotna granica Chernoffa, która ogranicza, że prawdopodobieństwo ogona jest co najmniej tak duże.
tj. jeśli X 1 , X 2 , … , X n
pr.probability
chernoff-bound
Ashwinkumar BV
źródło
źródło
Odpowiedzi:
Here is an explicit proof that a standard Chernoff bound is tight up to constant factors in the exponent for a particular range of the parameters. (In particular, whenever the variables are 0 or 1, and 1 with probability 1/2 or less, and ϵ∈(0,1/2)ϵ∈(0,1/2) , and the Chernoff upper bound is less than a constant.)
If you find a mistake, please let me know.
Lemma 1. (tightness of Chernoff bound) Let XX be the average of kk independent, 0/1 random variables (r.v.).
For any ϵ∈(0,1/2]ϵ∈(0,1/2] and p∈(0,1/2]p∈(0,1/2] , assuming ϵ2pk≥3ϵ2pk≥3 ,
(i) If each r.v. is 1 with probability at most pp , then
Pr[X≤(1−ϵ)p] ≥ exp(−9ϵ2pk).
(ii) If each r.v. is 1 with probability at least pp , then
Pr[X≥(1+ϵ)p] ≥ exp(−9ϵ2pk).
Proof. We use the following observation:
Claim 1. If 1≤ℓ≤k−11≤ℓ≤k−1 , then
(kℓ) ≥ 1e√2πℓ(kℓ)ℓ(kk−ℓ)k−ℓ(kℓ) ≥ 1e2πℓ−−−√(kℓ)ℓ(kk−ℓ)k−ℓ
Proof of Claim 1. By Stirling's approximation, i!=√2πi(i/e)ieλi!=2πi−−−√(i/e)ieλ where λ∈[1/(12i+1),1/12i].λ∈[1/(12i+1),1/12i].
Thus, (kℓ)(kℓ) , which is k!ℓ!(k−ℓ)!k!ℓ!(k−ℓ)! , is at least
√2πk(ke)k√2πℓ(ℓe)ℓ √2π(k−ℓ)(k−ℓe)k−ℓexp(112k+1−112ℓ−112(k−ℓ))
Proof of Lemma 1 Part (i). Without loss of generality assume each 0/1 random variable in the sum XX
is 1 with probability exactly pp .
Note Pr[X≤(1−ϵ)p]Pr[X≤(1−ϵ)p] equals the sum ∑⌊(1−ϵ)pk⌋i=0Pr[X=i/k]∑⌊(1−ϵ)pk⌋i=0Pr[X=i/k] ,
and Pr[X=i/k]=(ki)pi(1−p)k−iPr[X=i/k]=(ki)pi(1−p)k−i .
Fix ℓ=⌊(1−2ϵ)pk⌋+1ℓ=⌊(1−2ϵ)pk⌋+1 .
The terms in the sum are increasing,
so the terms with index i≥ℓi≥ℓ
each have value at least Pr[X=ℓ/k]Pr[X=ℓ/k] ,
so their sum has total value at least
(ϵpk−2)Pr[X=ℓ/k](ϵpk−2)Pr[X=ℓ/k] .
To complete the proof, we show that
(ϵpk−2)Pr[X=ℓ/k] ≥ exp(−9ϵ2pk).
The assumptions ϵ2pk≥3ϵ2pk≥3 and ϵ≤1/2ϵ≤1/2
give ϵpk≥6ϵpk≥6 ,
so the left-hand side above is at least 23ϵpk(kℓ)pℓ(1−p)k−ℓ23ϵpk(kℓ)pℓ(1−p)k−ℓ .
Using Claim 1,
to bound (kℓ)(kℓ) ,
this is in turn at least ABAB
where
A=23eϵpk/√2πℓA=23eϵpk/2πℓ−−−√
and
B=(kℓ)ℓ(kk−ℓ)k−ℓpℓ(1−p)k−ℓ.B=(kℓ)ℓ(kk−ℓ)k−ℓpℓ(1−p)k−ℓ.
To finish we show A≥exp(−ϵ2pk)A≥exp(−ϵ2pk) and B≥exp(−8ϵ2pk)B≥exp(−8ϵ2pk) .
Claim 2. A≥exp(−ϵ2pk)A≥exp(−ϵ2pk)
Proof of Claim 2. The assumptions ϵ2pk≥3ϵ2pk≥3 and ϵ≤1/2ϵ≤1/2
imply
(i) pk≥12pk≥12 .
By definition, ℓ≤pk+1ℓ≤pk+1 .
By (i), pk≥12pk≥12 .
Thus, (ii) ℓ≤1.1pkℓ≤1.1pk .
Substituting the right-hand side of (ii) for ℓℓ in AA gives
(iii) A≥23eϵ√pk/2.2πA≥23eϵpk/2.2π−−−−−−√ .
The assumption, ϵ2pk≥3ϵ2pk≥3 ,
implies ϵ√pk≥√3ϵpk−−√≥3–√ ,
which with (iii) gives (iv) A≥23e√3/2.2π≥0.1A≥23e3/2.2π−−−−−−√≥0.1 .
From ϵ2pk≥3ϵ2pk≥3 it follows that (v) exp(−ϵ2pk)≤exp(−3)≤0.04exp(−ϵ2pk)≤exp(−3)≤0.04 .
(iv) and (v) together give the claim. QED
Claim 3. B≥exp(−8ϵ2pk)B≥exp(−8ϵ2pk) .
Proof of Claim 3. Fix δδ such that ℓ=(1−δ)pkℓ=(1−δ)pk .ℓ implies δ≤2ϵδ≤2ϵ ,
so
the claim will hold as long as B≥exp(−2δ2pk)B≥exp(−2δ2pk) .
Taking each side of this latter inequality to the power −1/ℓ−1/ℓ and simplifying,
it is equivalent to
ℓpk(k−ℓ(1−p)k)k/ℓ−1 ≤ exp(2δ2pkℓ).
The choice of ℓ
Claims 2 and 3 imply AB≥exp(−ϵ2pk)exp(−8ϵ2pk)AB≥exp(−ϵ2pk)exp(−8ϵ2pk) .
This implies part (i) of the lemma.
Proof of Lemma 1 Part (ii). Without loss of generality assume each random variable is 11 with probability exactly pp .
Note Pr[X≥(1+ϵ)p]=∑ni=⌈(1−ϵ)pk⌉Pr[X=i/k]Pr[X≥(1+ϵ)p]=∑ni=⌈(1−ϵ)pk⌉Pr[X=i/k] .
Fix ˆℓ=⌈(1+2ϵ)pk⌉−1ℓ^=⌈(1+2ϵ)pk⌉−1 .
The last ϵpkϵpk terms in the sum
total at least (ϵpk−2)Pr[X=ˆℓ/k](ϵpk−2)Pr[X=ℓ^/k] ,
which is at least exp(−9ϵ2pk)exp(−9ϵ2pk) .
(The proof of that is the same as for (i), except with ℓℓ replaced by ˆℓℓ^
and δδ replaced by −ˆδ−δ^ such that ˆℓ=(1+ˆδ)pkℓ^=(1+δ^)pk .)
QED
źródło
The Berry-Esseen theorem can give tail probability lower bounds, as long as they are higher than n−1/2.
Another tool you can use is the Paley-Zygmund inequality. It implies that for any even integer k, and any real-valued random variable X,
Pr[|X|>=12(E[Xk])1/k]≥E[Xk]24E[X2k]
Together with the multinomial theorem, for X a sum of n rademacher random variables Paley-Zygmund can get you pretty strong lower bounds. Also it works with bounded-independence random variables. For example you easily get that the sum of n 4-wise independent ±1 random variables is Ω(√n) with constant probability.
źródło
If you are indeed okay with bounding sums of Bernoulli trials (and not, say, bounded random variables), the following is pretty tight.
(Treating the argument to Φ as transforming the standard normal, this agrees exactly with what the CLT tells you; in fact, it tells us that Binomials satisfying the conditions of the theorem will dominate their corresponding Gaussians on upper tails.)
From here, you can use bounds on Φ to get something nicer. For instance, in Feller's first book, in the section on Gaussians, it is shown for every z>0 that z1+z2φ(z)<1−Φ(z)<1zφ(z),
Other than that, and what other people have said, you can also try using the Binomial directly, perhaps with some Stirling.
(*) Some newer statements of Slud's inequality leave out some of these conditions; I've reproduced the one in Slud's paper.
źródło
The de Moivre-Laplace Theorem shows that variables like |T∩S1|, after being suitably normalised and under certain conditions, will converge in distribution to a normal distribution. That's enough if you want constant lower bounds.
For lower bounds like n−c, you need a slightly finer tool. Here's one reference I know of (but only by accident - I've never had the opportunity to use such an inequality myself). Some explicit lower bounds on tail probabilities of binomial distributions are given as Theorem 1.5 the book Random graphs by Béla Bollobás, Cambridge, 2nd edition, where further references are given to An introduction to probability and its applications by Feller and Foundations of Probability by Rényi.
źródło
The Generalized Littlewood-Offord Theorem isn't exactly what you want, but it gives what I think of as a "reverse Chernoff" bound by showing that the sum of random variables is unlikely to fall within a small range around any particular value (including the expectation). Perhaps it will be useful.
Formally, the theorem is as follows.
Generalized Littlewood-Offord Theorem: Let a1,…,an, and s>0 be real numbers such that |ai|≥s for 1≤i≤n and let X1,…,Xn be independent random variables that have values zero and one. For 0<p≤12, suppose that p≤Pr[Xi=0]≤1−p for all 1≤i≤n. Then, for any r∈R, Pr[r≤n∑i=1aiXi<r+s]≤cp√n
źródło
The exponent in the standard Chernoff bound as it is stated on Wikipedia is tight for 0/1-valued random variables. Let 0<p<1 and let X1,X2,… be a sequence of independent random variables such that for each i, Pr[Xi=1]=p and Pr[Xi=0]=1−p. Then for every ε>0, 2−D(p+ε‖p)⋅nn+1≤Pr[n∑i=1Xi≥(p+ε)n]≤2−D(p+ε‖p)⋅n.
Here, D(x‖y)=xlog2(x/y)+(1−x)log2((1−x)/(1−y)), which is the Kullback-Leibler divergence between Bernoulli random variables with parameters x and y.
As mentioned, the upper bound in the inequality above is proved on Wikipedia (https://en.wikipedia.org/wiki/Chernoff_bound) under the name "Chernoff-Hoeffding Theorem, additive form". The lower bound can be proved using e.g. the "method of types". See Lemma II.2 in [1]. Also, this is covered in the classic textbook on information theory by Cover and Thomas.
[1] Imre Csiszár: The Method of Types. IEEE Transactions on Information Theory (1998). http://dx.doi.org/10.1109/18.720546
źródło