Standard Deviation Explained Simply

Standard deviation answers a question the mean can't: how spread out is the data around the average? Two datasets can have the same mean and very different standard deviations, which tells you whether values cluster tightly or scatter widely. The number itself is in the same units as the data, which makes it more readable than its cousin, variance.

Key Takeaways

Standard deviation (SD) measures how far values typically deviate from the mean.
SD is the square root of variance.
In a normal distribution, ~68% of values fall within 1 SD of the mean, ~95% within 2 SD, ~99.7% within 3 SD.
Population SD uses N in the denominator; sample SD uses N − 1 (Bessel's correction).
Higher SD = more spread; lower SD = more consistent.

What Standard Deviation Tells You

A dataset with mean 70 and SD 2 is tightly clustered: most values are between 68 and 72. A dataset with the same mean of 70 and SD 15 is widely scattered: values commonly range from 55 to 85, and some fall further out.

Same average, completely different story. Standard deviation is the second story.

In practical terms:

Manufacturing: lower SD on a quality metric means more consistent products.
Investing: lower SD on returns means less volatile, more predictable returns.
Teaching: lower SD on test scores means more uniform performance.
Sports: lower SD on race times means a more consistent runner.

The Formula

Variance (σ²) = average of squared differences from the mean.

Standard Deviation (σ) = √variance.

For a population:

σ = √[Σ(x − μ)² / N]

For a sample (when the dataset is a subset of a larger population):

s = √[Σ(x − x̄)² / (N − 1)]

Where:

σ or s = standard deviation
μ or x̄ = mean
x = each value
N = number of values

The reason for the squaring: simple differences from the mean cancel out (positives and negatives sum to zero), so we square them. Taking the square root at the end brings the units back to the original scale.

Worked Example

Five test scores: 70, 72, 76, 80, 82. Mean = 76.

x	(x − 76)	(x − 76)²
70	−6	36
72	−4	16
76	0	0
80	4	16
82	6	36

Sum of squared deviations: 104.

Population SD: √(104 / 5) = √20.8 = 4.56 Sample SD: √(104 / 4) = √26 = 5.10

Most everyday calculations use sample SD because real data is almost always a sample of a larger population.

The 68-95-99.7 Rule

For data that follows a normal distribution (bell curve), standard deviation has a powerful interpretive shortcut:

About 68% of values fall within ±1 SD of the mean
About 95% fall within ±2 SD
About 99.7% fall within ±3 SD

This is the empirical rule, and it is the single most useful fact in introductory statistics.

Example: Adult height in a population has mean 170 cm and SD 7 cm.

68% are between 163 and 177 cm
95% are between 156 and 184 cm
99.7% are between 149 and 191 cm

A 6-foot-tall person (183 cm) is roughly +1.86 SD above the mean, meaning they are taller than about 97% of the population.

The rule only applies cleanly to normal distributions. Skewed distributions (income, home prices) will produce different proportions.

Population vs Sample Standard Deviation

The choice of N or N−1 in the denominator is small but important.

Use population SD (divide by N) when:

Your data IS the entire population you care about
Examples: all students in a single class, every employee at a company on a given day

Use sample SD (divide by N−1) when:

Your data is a subset used to estimate the broader population's SD
Examples: a survey, a quality-control sample, scientific experiments

The N−1 (Bessel's correction) adjusts for the fact that a sample tends to underestimate the true population spread. Most statistical software defaults to sample SD because that's the more common situation.

Variance vs Standard Deviation

Variance is SD squared. It is mathematically convenient (no square roots), but its units are squared too; variance of test scores is in "score-squared," which is hard to interpret.

Metric	Formula	Units	When to Use
Variance	σ² or s²	Squared units	In further calculations (regression, ANOVA)
Standard Deviation	σ or s	Same as data	Reporting, interpretation, comparison

Always report SD in plain-language contexts. Use variance internally where it simplifies the math.

Z-Scores

A z-score expresses how many standard deviations a value is from the mean:

z = (x − mean) / SD

A z-score of 2.0 means the value is 2 SDs above the mean. Negative z-scores are below the mean. Z-scores standardize values from any distribution onto a common scale, making cross-comparison possible.

Example: A student scores 88 on a test where the class mean is 76 and SD is 6.

z = (88 − 76) / 6 = 2.0

The student is 2 SDs above the class mean: top ~2.5% in a normal distribution.

Real-World Scenarios

Scenario 1: Investment volatility. Fund A has 10% average annual return with 8% SD. Fund B has 10% average annual return with 18% SD. Same expected return, dramatically different ride. Fund A's typical year falls between +2% and +18%; Fund B's typical year ranges from −8% to +28%. SD captures the risk Fund A and Fund B's identical mean obscures.

Scenario 2: Manufacturing. A bottling plant fills bottles to a target of 500 ml. Acceptable tolerance is ±5 ml. Quality control samples show mean 500.2 ml and SD 1.4 ml. About 99.7% of bottles fall within ±4.2 ml of the mean (3 SDs), well within tolerance. Doubling the SD to 2.8 would push 3 SD to ±8.4 ml, a meaningful quality problem.

Scenario 3: Test grading. Two classes both average 75. Class A's SD is 4 (tight cluster, similar performance). Class B's SD is 15 (wide spread, some students far behind or ahead). The teaching approach for these two classes should differ even though the means are identical.

Scenario 4: Wait times. A coffee shop averages 4-minute wait with 1-minute SD. Most customers wait 3–5 minutes. The same shop with a 4-minute average and 3-minute SD has many customers waiting 1 minute and many waiting 7+ minutes: same average, much worse experience.

Common Mistakes

Reporting mean without SD. Two datasets with the same mean can be wildly different. Always pair them.

Using population SD when you have a sample. Underestimates the true SD. Use N−1 when in doubt.

Comparing SDs across different units or scales. SDs of test scores and SDs of weights are not directly comparable. Use coefficient of variation (SD/mean) for cross-scale comparison.

Applying the 68-95-99.7 rule to non-normal data. Skewed distributions don't follow the rule. Check the shape first.

Confusing SD with standard error. Standard error measures uncertainty in the mean estimate, not spread in the data. Standard error = SD / √N.

Treating SD as a maximum range. SD is a typical deviation, not a hard limit. Some values will be 3+ SDs from the mean in any large dataset.

Coefficient of Variation: Relative Spread

When comparing variability across datasets with different means or units, use the coefficient of variation (CV):

CV = (SD / Mean) × 100%

A CV of 5% means SD is 5% of the mean: a tight distribution. A CV of 50% means much more variability relative to scale.

Example: Investment with 8% return and 4% SD has CV 50%. Another with 20% return and 6% SD has CV 30%, actually less volatile relative to its return.

FAQ

What does a high standard deviation mean? It means the values are spread widely around the mean. A low SD means values cluster tightly.

Is standard deviation the same as variance? No. SD is the square root of variance. SD is in the original units; variance is in squared units.

When should I use sample vs population standard deviation? Use sample SD (N−1) when your data is a sample of a larger population (the most common case). Use population SD (N) only when you have data for the entire population.

What is the 68-95-99.7 rule? For normal distributions: 68% of values fall within 1 SD of the mean, 95% within 2 SD, 99.7% within 3 SD. A quick way to read SD intuitively.

Can standard deviation be negative? No. SD is the square root of squared differences; always zero or positive. A SD of zero means every value is identical.

How do I interpret a z-score? A z-score tells you how many SDs a value is from the mean. Positive is above, negative is below. |z| > 2 generally indicates an unusual value in a normal distribution.

Why is variance useful if SD is in better units? Variance has cleaner mathematical properties for further calculations (regression, hypothesis testing). It is preferred in formulas; SD is preferred in reporting.

Related Tools

The Standard Deviation Calculator computes both sample and population SD. The Variance Calculator handles variance directly, and the Z-Score Calculator converts raw values to standardized scores. For full summary statistics, use the Statistics Calculator.

Final Thoughts

Standard deviation is the most underused statistic in everyday data communication. Mean tells you the center; SD tells you whether the center is meaningful. A class with a mean of 75 and SD of 4 is a very different class than one with the same mean and SD of 15, and yet most reports only ever cite the mean. Build the habit of always reporting both numbers, and your data conversations get sharper immediately.