Powiedzmy, że istnieje elementów podzielonych na dwie grupy ( i ). Wariancja pierwszej grupy to a wariancja drugiej grupy to . Zakłada się, że same elementy są nieznane, ale znam środki i .
Czy istnieje sposób obliczenia łącznej wariancji ?
Wariancja nie musi być obiektywna, więc mianownik to a nie .
Odpowiedzi:
Użyj definicji średniej
i wariancja próbki
(the last term in parentheses is the unbiased variance estimator often computed by default in statistical software) to find the sum of squares of all the dataxi . Let's order the indexes i so that i=1,…,n designates elements of the first group and i=n+1,…,n+m designates elements of the second group. Break that sum of squares by group and re-express the two pieces in terms of the variances and means of the subsets of the data:
Algebraically solving this forσ2m+n in terms of the other (known) quantities yields
Of course, using the same approach,μ1:m+n=(nμ1:n+mμ1+n:m+n)/(m+n) can be expressed in terms of the group means, too.
An anonymous contributor points out that when the sample means are equal (so thatμ1:n=μ1+n:m+n=μ1:m+n ), the solution for σ2m+n is a weighted mean of the group sample variances.
źródło
sqrt(weighted.mean(u^2 + rho^2, n) - weighted.mean(u, n)^2)
wheren
,u
andrho
are equal-length vectors. E.g.n=c(10, 14, 9)
for three samples.I'm going to use standard notation for sample means and sample variances in this answer, rather than the notation used in the question. Using standard notation, another formula for the pooled sample variance of two groups can be found in O'Neill (2014) (Result 1):
This formula works directly with the underlying sample means and sample variances of the two subgroups, and does not require intermediate calculation of the pooled sample mean. (Proof of result in linked paper.)
źródło
Yes, given the mean, sample count, and variance or standard deviation of each of two or more groups of samples, you can exactly calculate the variance or standard deviation of the combined group.
This web page describes how to do it, and why it works; it also includes source code in Perl: http://www.burtonsys.com/climate/composite_standard_deviations.html
BTW, contrary to the answer given above,
See for yourself, e.g., in R:
źródło
R
computes the unbiased estimate of the standard deviation rather than the standard deviation of the set of numbers. For instance,sd(c(-1,1))
returns1.414214
rather than1
. Your example needs to usesqrt(9/10)*sd(x)
in place ofsd(x)
. Interpreting "n <- 10; x <- rnorm(n,5,2); m <- mean(x); s <- sd(x) * sqrt((n-1)/n); m2 <- sum(x^2); c(lhs=n * (m^2 + s^2), rhs=m2)