[Blindmath] statistical formulas
Jonathan Godfrey
a.j.godfrey at massey.ac.nz
Wed Oct 24 00:24:37 UTC 2012
Hi all,
I've decided to jump in here as I've spotted a small (but crucial) error in
the contributions given thus far. I'd also point out that the lecturing
staff at most universities now have the ability to put Greek and formula
into the text of email message in the same way they do in word documents.
They won't be readable either if done that way.
Correlation is the covariance divided by the square root of the variances.
For a population, the variance is
Var(x) = Sum[(x-mu)^2]/n
where n is the population size and mu is the population mean. Note that
sum[] means to sum over all observations.
Expanding that out so that there is no squaring going on would give:
Var(x) = Sum[(x-mu)(x-mu)]/n
If you don't do the division by n then this is the sum of squares sometimes
shortened to SS, or to denote the variable x, S_xx
A covariance is found using:
Cov(x,y) = Sum[(x-mu_x)(y-mu_y)]/n
Where the mu is relevant to either the x or y and therefore gets given the
subscripts.
The reduction to alternate forms comes because the cross product S_xy is the
numerator of the covariance. This means we can write the correlation as:
Cor(x,y) = Cov(x,y)/sqrt[Var(x)Var(y)]
Or
Cor(x,y) = S_xy / sqrt[S_xx S_yy]
Another notation uses the fact that the square root of the variance is the
standard deviation. This means that we see the correlation expressed as the
covariance divided by the product of the standard deviations.
Mathematically it's all the same. The expression using the cross products
(sum of squares) working is equally useful for samples and populations.
Remember that the division is by (n-1) for samples for both covariances and
variances.
Hope this helps.
Jonathan
More information about the BlindMath
mailing list