Odwrotna granica Chernoffa

31

Czy istnieje odwrotna granica Chernoffa, która ogranicza, że prawdopodobieństwo ogona jest co najmniej tak duże.

tj. jeśli $X_1,X_2,\ldots,X_n$ są niezależnymi dwumianowymi zmiennymi losowymi, a $\mu=\mathbb{E}[\sum_{i=1}^n X_i]$ . Czy możemy zatem udowodnić $Pr[\sum_{i=1}^n X_i\geq (1+\delta)\mu]\geq f(\mu,\delta,n)$ dla niektórych funkcji $f$ .

pr.probability chernoff-bound Ashwinkumar BV
źródło

1

Twój przykład o zbyt wiele: z

p=n−2/3 $p=n^{-2/3}$ , standardową nierówność chernoffa pokazuje, że

Pr[|T∩S1|≥1.1−−−√n1/3] $\Pr[|T\cap S_1| \geq \sqrt{1.1}n^{1/3}]$ i

Pr[|T∩S2|1.1−−−√≤n1/3] $\Pr[|T\cap S_2|\sqrt{1.1}\leq n^{1/3}]$ jest najwyżej

exp(−cn1/3) $\exp(-cn^{1/3})$ przez pewien

c $c$ .

Colin McQuillan

You are right, I got confused about which term in chernoff bound has the square. I have changed the question to reflect a weaker bound. I don't think it will help me in my current application, but it might be interesting for other reasons.

Ashwinkumar B V

28

Here is an explicit proof that a standard Chernoff bound is tight up to constant factors in the exponent for a particular range of the parameters. (In particular, whenever the variables are 0 or 1, and 1 with probability 1/2 or less, and $\epsilon\in(0,1/2)$ , and the Chernoff upper bound is less than a constant.)

If you find a mistake, please let me know.

Lemma 1. (tightness of Chernoff bound) Let $X$ be the average of $k$ independent, 0/1 random variables (r.v.). For any $\epsilon\in(0,1/2]$ and $p\in(0,1/2]$ , assuming $\epsilon^2 p k \ge 3$ ,

(i) If each r.v. is 1 with probability at most $p$ , then

Pr [X \leq (1 - ϵ) p] \geq exp (- 9 ϵ 2 p k) .

$\displaystyle \Pr[X\le (1-\epsilon)p] ~\ge~ \exp\big({-9\epsilon^2 pk}\big).$

(ii) If each r.v. is 1 with probability at least $p$ , then

Pr [X \geq (1 + ϵ) p] \geq exp (- 9 ϵ 2 p k) .

$\displaystyle \Pr[X\ge (1+\epsilon)p] ~\ge~ \exp\big({-9\epsilon^2 pk}\big).$

Proof. We use the following observation:

Claim 1. If $1\le \ell \le k-1$ , then $\displaystyle {k \choose \ell} ~\ge~ \frac{1}{e\sqrt{2\pi\ell}} \Big(\frac{k}{\ell}\Big)^{\ell} \Big(\frac{k}{k-\ell}\Big)^{k-\ell}$

Proof of Claim 1. By Stirling's approximation, $i!=\sqrt{2\pi i}(i/e)^ie^\lambda$ where $\lambda\in[1/(12i+1),1/12i].$

Thus, $k\choose \ell$ , which is $\frac{k!}{\ell! (k-\ell)!}$ , is at least

2 π k - - - \sqrt ( k e ) k 2 π ℓ - - - \sqrt ( ℓ e ) ℓ 2 π ( k - ℓ ) - - - - - - - - \sqrt ( k - ℓ e ) k - ℓ exp (1 12 k + 1 - 1 12 ℓ - 1 12 ( k - ℓ ))

$\frac{\sqrt{2\pi k}\,(\frac{k}{e})^k} { \sqrt{2\pi \ell}\,(\frac{\ell}{e})^\ell ~~\sqrt{2\pi (k-\ell)}\,(\frac{k-\ell}{e})^{k-\ell} } \exp\Big(\frac{1}{12k+1} - \frac{1}{12\ell} - \frac{1}{12(k-\ell)}\Big)$

\geq 1 2 π ℓ - - - \sqrt (k ℓ) ℓ (k k - ℓ) k - ℓ e - 1 .

$~\ge~ \frac{1}{\sqrt{2\pi\ell}} \Big(\frac{k}{\ell}\Big)^{\ell} \Big(\frac{k}{k-\ell}\Big)^{k-\ell}e^{-1}.$ QED

Proof of Lemma 1 Part (i). Without loss of generality assume each 0/1 random variable in the sum $X$ is 1 with probability exactly $p$ . Note $\Pr[X\le (1-\epsilon)p]$ equals the sum $\sum_{i = 0}^{\lfloor(1-\epsilon)pk\rfloor} \Pr[X=i/k]$ , and $\Pr[X=i/k] = {k \choose i} p^i (1-p)^{k-i}$ .

Fix $\ell = \lfloor(1-2\epsilon)pk\rfloor+1$ . The terms in the sum are increasing, so the terms with index $i\ge\ell$ each have value at least $\Pr[X=\ell/k]$ , so their sum has total value at least $(\epsilon pk - 2) \Pr[X=\ell/k]$ . To complete the proof, we show that

(ϵ p k - 2) Pr [X = ℓ / k] \geq exp (- 9 ϵ 2 p k) .

$(\epsilon pk - 2) \Pr[X=\ell/k] ~\ge~ \exp({-9\epsilon^2 pk}).$

The assumptions $\epsilon^2pk\ge 3$ and $\epsilon\le 1/2$ give $\epsilon pk \ge 6$ , so the left-hand side above is at least $\frac{2}{3}\epsilon pk\, {k \choose \ell} p^\ell(1-p)^{k-\ell}$ . Using Claim 1, to bound $k\choose \ell$ , this is in turn at least $A\, B$ where $A = \frac{2}{3e}\epsilon p k/ \sqrt{2\pi \ell}$ and $B= \big(\frac{k}{\ell}\big)^\ell \big(\frac{k}{k-\ell}\big)^{k-\ell} p^\ell (1-p)^{k-\ell}.$

To finish we show $A\ge \exp(-\epsilon^2pk)$ and $B \ge \exp(-8\epsilon^2 pk)$ .

Claim 2. $A \ge \exp({-\epsilon^2 pk})$

Proof of Claim 2. The assumptions $\epsilon^2 pk \ge 3$ and $\epsilon\le 1/2$ imply (i) $pk\ge 12$ .

By definition, $\ell \le pk + 1$ . By (i), $p k \ge 12$ . Thus, (ii) $\ell \,\le\, 1.1 pk$ .

Substituting the right-hand side of (ii) for $\ell$ in $A$ gives (iii) $A \ge \frac{2}{3e} \epsilon \sqrt{p k / 2.2\pi}$ .

The assumption, $\epsilon^2 pk \ge 3$ , implies $\epsilon\sqrt{ pk} \ge \sqrt 3$ , which with (iii) gives (iv) $A \ge \frac{2}{3e}\sqrt{3/2.2\pi} \ge 0.1$ .

From $\epsilon^2pk \ge 3$ it follows that (v) $\exp(-\epsilon^2pk) \le \exp(-3) \le 0.04$ .

(iv) and (v) together give the claim. QED

Claim 3. $B\ge \exp({-8\epsilon^2 pk})$ .

Proof of Claim 3. Fix $\delta$ such that $\ell=(1-\delta)pk$ .
The choice of $\ell$ implies $\delta\le 2\epsilon$ , so the claim will hold as long as $B \ge \exp(-2\delta^2pk)$ . Taking each side of this latter inequality to the power $-1/\ell$ and simplifying, it is equivalent to

ℓ p k (k - ℓ ( 1 - p ) k) k / ℓ - 1 \leq exp (2 δ 2 p k ℓ) .

$\frac{\ell}{p k} \Big(\frac{k-\ell}{(1-p) k}\Big)^{k/\ell-1} ~\le~ \exp\Big(\frac{2\delta^2 pk}{\ell}\Big).$ Substituting

ℓ=(1−δ)pk $\ell= (1-\delta)pk$ and simplifying, it is equivalent to

(1 - δ) (1 + δ p 1 - p) 1 ( 1 - δ ) p - 1 \leq exp (2 δ 2 1 - δ) .

$(1-\delta) \Big(1+\frac{\delta p}{1-p}\Big)^{\displaystyle \frac{1}{(1-\delta)p}-1} ~\le~ \exp\Big(\frac{2\delta^2}{1-\delta}\Big).$ Taking the logarithm of both sides and using

ln(1+z)≤z $\ln(1+z)\le z$ twice, it will hold as long as

- δ + δ p 1 - p (1 ( 1 - δ ) p - 1) \leq 2 δ 2 1 - δ .

$-\delta\, +\,\frac{\delta p}{1-p}\Big(\frac{1}{(1-\delta)p}-1\Big) ~\le~ \frac{2\delta^2}{1-\delta}.$ The left-hand side above simplifies to

δ2/(1−p)(1−δ) $\delta^2/\,(1-p)(1-\delta)$ , which is less than

2δ2/(1−δ) $2\delta^2/(1-\delta)$ because

p≤1/2 $p\le 1/2$ . QED

Claims 2 and 3 imply $A B \ge \exp({-\epsilon^2pk})\exp({- 8\epsilon^2pk})$ . This implies part (i) of the lemma.

Proof of Lemma 1 Part (ii). Without loss of generality assume each random variable is $1$ with probability exactly $p$ .

Note $\Pr[X\ge (1+\epsilon)p] = \sum_{i = \lceil(1-\epsilon)pk\rceil}^n \Pr[X=i/k]$ . Fix $\hat\ell = \lceil (1+2\epsilon)pk \rceil - 1$ .

The last $\epsilon pk$ terms in the sum total at least $(\epsilon pk-2)\Pr[X=\hat\ell/k]$ , which is at least $\exp({-9\epsilon^2 pk})$ . (The proof of that is the same as for (i), except with $\ell$ replaced by $\hat\ell$ and $\delta$ replaced by $-\hat\delta$ such that $\hat\ell = (1+\hat\delta)pk$ .) QED

Neal Young
źródło

Several [math processing error]s -- any chance of fixing them?

Aryeh

Those math expressions used to display just fine. For some reason the \choose command is not working in mathjax. Neither is \binom. E.g. $a \choose b$ gives

$a \choose b$ . Presumably this is a bug in the mathjax configuration. Hopefully it will be fixed soon. Meanwhile see Lemma 5.2 in the appendix of arxiv.org/pdf/cs/0205046v2.pdf or cs.ucr.edu/~neal/Klein15Number.

Neal Young

22

The Berry-Esseen theorem can give tail probability lower bounds, as long as they are higher than $n^{-1/2}$ .

Another tool you can use is the Paley-Zygmund inequality. It implies that for any even integer $k$ , and any real-valued random variable $X$ ,

$\Pr[|X| >= \frac{1}{2}(\mathbb{E}[X^k])^{1/k}] \geq \frac{\mathbb{E}[X^k]^2}{4\mathbb{E}[X^{2k}]}$

Together with the multinomial theorem, for $X$ a sum of $n$ rademacher random variables Paley-Zygmund can get you pretty strong lower bounds. Also it works with bounded-independence random variables. For example you easily get that the sum of $n$ 4-wise independent $\pm 1$ random variables is $\Omega(\sqrt{n})$ with constant probability.

Sasho Nikolov
źródło

14

If you are indeed okay with bounding sums of Bernoulli trials (and not, say, bounded random variables), the following is pretty tight.

Slud's Inequality*. Let $\{X_i\}_{i=1}^n$ be i.i.d. draws from a Bernoulli r.v. with $\mathbb{E}(X_1) = p$ , and let integer $k\leq n$ be given. If either (a) $p\leq 1/4$ and $np \leq k$ , or (b) $np \leq k \leq n(1-p)$ , then
$\text{Pr}\big[\sum_i X_i \geq k\big] \geq 1 - \Phi\left(\frac{k-np}{\sqrt{np(1-p)}}\right),$ where $\Phi$ is the cdf of a standard normal.

(Treating the argument to $\Phi$ as transforming the standard normal, this agrees exactly with what the CLT tells you; in fact, it tells us that Binomials satisfying the conditions of the theorem will dominate their corresponding Gaussians on upper tails.)

From here, you can use bounds on $\Phi$ to get something nicer. For instance, in Feller's first book, in the section on Gaussians, it is shown for every $z>0$ that

$\frac{z}{1+z^2}\varphi(z) < 1-\Phi(z) < \frac{1}{z}\varphi(z),$ where

$\varphi$ is the density of a standard normal. There are similar bounds in the Wikipedia article for "Q-function" as well.

Other than that, and what other people have said, you can also try using the Binomial directly, perhaps with some Stirling.

(*) Some newer statements of Slud's inequality leave out some of these conditions; I've reproduced the one in Slud's paper.

matus
źródło

7

The de Moivre-Laplace Theorem shows that variables like $|T\cap S_1|$ , after being suitably normalised and under certain conditions, will converge in distribution to a normal distribution. That's enough if you want constant lower bounds.

For lower bounds like $n^{-c}$ , you need a slightly finer tool. Here's one reference I know of (but only by accident - I've never had the opportunity to use such an inequality myself). Some explicit lower bounds on tail probabilities of binomial distributions are given as Theorem 1.5 the book Random graphs by Béla Bollobás, Cambridge, 2nd edition, where further references are given to An introduction to probability and its applications by Feller and Foundations of Probability by Rényi.

Colin McQuillan
źródło

4

The Generalized Littlewood-Offord Theorem isn't exactly what you want, but it gives what I think of as a "reverse Chernoff" bound by showing that the sum of random variables is unlikely to fall within a small range around any particular value (including the expectation). Perhaps it will be useful.

Formally, the theorem is as follows.

Generalized Littlewood-Offord Theorem: Let $a_1, \ldots, a_n$ , and $s>0$ be real numbers such that $|a_i| \ge s$ for $1 \le i \le n$ and let $X_1, \ldots, X_n$ be independent random variables that have values zero and one. For $0 < p \le \frac{1}{2}$ , suppose that $p \le \Pr[X_i = 0] \le 1-p$ for all $1 \le i \le n$ . Then, for any $r \in \mathcal{R}$ ,

$\Pr \left[ r \le \sum_{i=1}^{n}{a_iX_i} < r+s\right] \le \frac{c_p}{\sqrt{n}}$ Where

$c_p$ is a constant depending only on

$p$ .

Lev Reyzin
źródło

3

It may be helpful to others to know that this type of result is also known as a "small ball inequality" and Nguyen and Vu have a terrific survey people.math.osu.edu/nguyen.1261/cikk/LO-survey.pdf. My perspective here slightly differs from yours. I think of a "reverse Chernoff" bound as giving a lower estimate of the probability mass of the small ball around 0. I think of a small ball inequality as qualitatively saying that the small ball probability is maximized by the ball at 0. In this sense reverse Chernoff bounds are usually easier to prove than small ball inequalities.

Sasho Nikolov

3

The exponent in the standard Chernoff bound as it is stated on Wikipedia is tight for 0/1-valued random variables. Let $0<p<1$ and let $X_1,X_2,\ldots$ be a sequence of independent random variables such that for each $i$ , $\Pr[X_i=1]=p$ and $\Pr[X_i=0]=1-p$ . Then for every $\varepsilon>0$ ,

$\begin{equation} \frac{2^{-D(p+\varepsilon\| p)\cdot n}}{n+1}\leq \Pr\left[ \sum_{i=1}^n X_i \geq (p+\varepsilon)n\right]\leq 2^{-D(p+\varepsilon\| p)\cdot n}. \end{equation}$

Here, $D(x\| y)=x \log_2(x/y)+(1-x)\log_2((1-x)/(1-y))$ , which is the Kullback-Leibler divergence between Bernoulli random variables with parameters $x$ and $y$ .

As mentioned, the upper bound in the inequality above is proved on Wikipedia (https://en.wikipedia.org/wiki/Chernoff_bound) under the name "Chernoff-Hoeffding Theorem, additive form". The lower bound can be proved using e.g. the "method of types". See Lemma II.2 in [1]. Also, this is covered in the classic textbook on information theory by Cover and Thomas.

[1] Imre Csiszár: The Method of Types. IEEE Transactions on Information Theory (1998). http://dx.doi.org/10.1109/18.720546

JWM
źródło

It is also worth noting that

$D(p+\delta p\|p)=\frac{p}{2-2p}\delta^2+O(\delta^3)$ , and for common case of

$p=1/2$ it is

$\frac{1}{2}\delta^2+O(\delta^4)$ . This shows that when

$\delta=O(n^{-1/3})$ the typical

$e^{-C \delta^2}$ bound is sharp. (And when

$\delta=O(n^{-1/4})$ for

$p=1/2$ ).

Thomas Ahle

Odwrotna granica Chernoffa

Odpowiedzi: