Reading:  

Data


Formulas for Standard Deviation

Formulas for Standard Deviation

Standard Deviation measures how spread out the values in a set of data are. We use the Greek letter \(\sigma\) (sigma) as the symbol for standard deviation, and calculate it using the following formula

\(\sigma = \sqrt{\dfrac{1}{N} \sum_{i = 1}^N (x_i - \mu)^2}\)

Now, that formula looks pretty scary, doesn't it. Don't worry, we're going to work through what it actually says, but to calculate the standard deviation of a list of numbers, we

  1. Find the mean (average) of the values. The formula calls this (\mu).
  2. Subtract each value from the mean and square the result to give the squared difference.
  3. Find the average of the squared differences. That's the variance. This is why we're dividing by \(N\), which refers to the number of elements in our list.
  4. Take the square root to give the standard deviation.

The formula actually tells you to take all of those steps. Let's see how. We'll work through it with an example.

A Worked Example

Lucy, our Cavalier King Charles Spaniel puppy has lots of brothers and sisters. Her mum has had 16 puppies altogether. Jasmin has decided to work out the standard deviation of the birth weights of all of these puppies.

Formulas for Standard Deviation

The birth weights of the 16 puppies are:

113 g, 128 g, 233 g, 212 g, 241 g, 135 g, 119 g, 237 g, 240 g, 156 g, 162 g, 171 g, 182 g, 164 g, 168 g, 174 g
Step 1: Find the mean \(\mu\) of these weights. Add them all up and divide by 16:
\( \begin{align*} \mu &= \dfrac{113 + 128 + 233 + 212 + 241 + 135 + 119 + 237 + 240 + 156 + 162 +171 + 182 + 164 + 168 + 174}{16}\\ &= \dfrac{2385}{16}\\ \mu &= 177.1875 \text{ g} \end{align*} \)

Step 2: Subtract the mean from each number in your data set and square the result to give the squared difference. The part of the formula that says

\((x_i - \mu)^2\)
refers to this step.

The \(x_i\)s range through all the individual data values: 113, 128, 233, 241, etc, one at a time. We choose one to call \(x_1\), say 113, one to call \(x_2\), say 128, and so on.

Our squared differences (correct to 3 decimal places) are:

  • \( (113 - 177.1875)^2 = 4120.035\)
  • \(128 - 177.1875)^2 = 2419.410\)
  • \((233 - 177.1875)^2 = 3115.035\)
  • \(212 - 177.1875)^2 = 1211.910\)
  • \( (241 - 177.1875)^2 = 4072.035\)
  • \(135 - 177.1875)^2 = 1779.785\)
  • \((119 - 177.1875)^2 = 3385.785\)
  • \(237 - 177.1875)^2 = 3577.535\)
  • \( (240 - 177.1875)^2 = 3945.410\)
  • \(156 - 177.1875)^2 = 448.910\)
  • \((162 - 177.1875)^2 = 230.660\)
  • \(171 - 177.1875)^2 = 46.478\)
  • \( (182 - 177.1875)^2 = 23.160\)
  • \(164 - 177.1875)^2 = 173.910\)
  • \((168 - 177.1875)^2 = 84.410\)
  • \(174 - 177.1875)^2 = 10.160\)

Step 3: Find the average of the squared differences. Add them all up and divide by \(N = 16\), the number of data values.

This is the reason for the funny notation in the formula:

\(\sum_{i = 1}^N (x_i - \mu)^2\)
This is called summation notation and it just means to add. We add all the values \((x_1 - 177.1875)^2, (x_2 - 177.1875)^2, \dots, (x_N - 177.1875)^2\) and divide by \(N\). Let's find the sum first. We found all the values we're adding in the previous step:
\(\sum_{i = 1}^N (x_i - \mu)^2 = 4120.035 + 2419.410 + 3115.035 + 1211.910 + 4072.035 + 1779.785 + 3385.785 + 3577.535 + 3945.410 + 448.910 + 230.660 + 46.478 + 23.160 + 173.910 + 84.410 + 10.160 =28645.023\)
Next divide by \(16\) to find the mean:
\(\dfrac{1}{16}\sum_{i = 1}^N (x_i - \mu)^2 = \dfrac{28645.023}{16} = 1790.314\)
This is the variance of our data set.

Step 4: Finally, take the square root

\( \begin{align*} \sigma &= \sqrt{\dfrac{1}{N} \sum_{i = 1}^N (x_i - \mu)^2}\\ &= \sqrt{1790.314}\\ &= 42.31\;\;\;\;\;\; (2 \text{ decimal places}) \end{align*} \)

Standard Deviation of a Sample

Sometimes our data set only provides a sample of the entire population. This might be because it is impossible to collect data for the whole population or because it is too expensive or time consuming for the whole population.

The idea is that we use data from a small subset of the population to predict the values for the whole population.

Using a sample can often give us a good idea of what's going on with the whole population, but we do introduce errors into our values called sample errors

The formula for the standard deviation of a sample is slightly different to the one for the whole population: instead of dividing by \(N\) (the size of the whole population), we divide by \(N - 1\) (the size of the sample minus 1). There are other slight differences in the notation in the formula. We call the sample standard deviation \(s\), and the sample mean \(\overline{x}\). So the formula looks like this:

\(s = \dfrac{1}{N-1}\sum_{i = 1}^N (x_i - \overline{x})^2\)

Our Puppy Example with a Sample

Suppose we only know the weights of 5 of the 16 puppies: 113 g, 240 g, 171 g, 182 g and 174 g.

Then our population is all 16 puppies.

Our sample is the 5 puppies that we know the weights for.

We can use the formula for sample standard deviation to estimate the standard deviation of the entire population. Let's complete the steps to calculate the value:

Step 1: Find the mean of the sample values.

\(\overline{x} = \dfrac{113 + 240 + 171 + 182 + 174}{5} = \dfrac{880}{5} =176 \)

Step 2: Subtract the mean from each weight and square the result.

  • \((113 - 176 )^2 = 3969\)
  • \((240 - 176)^2 = 4096\)
  • \(171 - 176)^2 = 25\)
  • \(182 - 176)^2 = 36\)
  • \(174 - 176)^2 = 4\)

Step 3: Find the "mean" of the squared differences. Don't forget to divide by \(N - 1\) as we're working with a sample.

\( \begin{align*} \text{Sum} &= 3969 + 4096 + 25 + 36 + 4 = 8130\\ \text{Divide by } &N - 1 \text{ to give } \dfrac{1}{5}(8130) = 1626 \end{align*} \)
This value is called the sample variance.

Step 4: Take the square root of the sample variance. This is the sample standard variation:

\( \begin{align*} s &=\sqrt{\dfrac{1}{N-1}\sum_{i = 1}^N (x_i - \overline{x})^2}\\ &= \sqrt{1626} = 40.323\dots \end{align*} \)
So, the sample standard deviation is \(s = 40.323\).

Comparision

When we used the whole population, the mean was \(177.1875\), and the standard deviation was \(42.31\).

When we used the sample, the mean was \(176\), and the standard deviation was \(40.323\).

The sample mean was wrong by less than \(1\%\) and the sample standard deviation was wrong by \(4.7\%\).

Summary

The population standard deviation is given by the formula

\(\sigma = \sqrt{\dfrac{1}{N} \sum_{i = 1}^N (x_i - \mu)^2}\)
and the sample standard deviation is given by the formula
\(s = \sqrt{\dfrac{1}{N-1} \sum_{i = 1}^N (x_i - \overline{x})^2}\)

Description

This chapter series is on Data and is suitable for Year 10 or higher students, topics include

  • Accuracy and Precision
  • Calculating Means From Frequency Tables
  • Correlation
  • Cumulative Tables and Graphs
  • Discrete and Continuous Data
  • Finding the Mean
  • Finding the Median
  • FindingtheMode
  • Formulas for Standard Deviation
  • Grouped Frequency Distribution
  • Normal Distribution
  • Outliers
  • Quartiles
  • Quincunx
  • Quincunx Explained
  • Range (Statistics)
  • Skewed Data
  • Standard Deviation and Variance
  • Standard Normal Table
  • Univariate and Bivariate Data
  • What is Data

 



Audience

Year 10 or higher students, some chapters suitable for students in Year 8 or higher

Learning Objectives

Learn about topics related to "Data"

Author: Subject Coach
Added on: 28th Sep 2018

You must be logged in as Student to ask a Question.

None just yet!