Calculating the Mean

When most people say "average," they mean the arithmetic mean. Add up all the values and divide by how many there are. That's it. You can think of it as the balancing point: if every value in the dataset were adjusted to equal the mean, the total wouldn't change.

Formula: Mean = (Sum of all values) ÷ (Number of values)

Dataset: 14, 19, 22, 22, 28, 31, 35
Sum = 14 + 19 + 22 + 22 + 28 + 31 + 35 = 171
Count = 7
Mean = 171 ÷ 7 = 24.43 (to 2 d.p.)

The mean uses every value, which sounds great, but it's also why it can mislead you. One extreme value, high or low, can pull the mean far away from where most of the data actually is. If one person scores 100 and everyone else scores 50, the mean looks fine on paper but it's kind of lying to you.

For a weighted mean, multiply each value by its weight, sum the products, then divide by the total of the weights. Teachers use it all the time when different assignments count for different portions of your final grade.

Finding the Median

The median is the middle value of a sorted dataset. Sort everything from smallest to largest, find the middle, and that's it. Half the values sit below it, half above. And here's the key thing: extreme values don't move the median at all.

Odd Number of Values

Dataset: 7, 11, 14, 18, 23, 29, 34
(Already sorted - 7 values)
Middle position = (7 + 1) ÷ 2 = position 4
Median = 18

Even Number of Values

When the count is even, there's no exact middle, so you take the two central values and average them.

Dataset: 5, 9, 13, 17, 24, 30
(6 values - two middle values are at positions 3 and 4)
Middle values: 13 and 17
Median = (13 + 17) ÷ 2 = 15

Always sort the data first. Seriously, always. Finding the median from an unsorted list is one of the most common mistakes, and the position only makes sense when the numbers are in order.

Identifying the Mode

The mode is simply the value that shows up most often. There's no formula. You count how many times each value appears and report the one with the highest count.

Dataset: 4, 7, 7, 9, 11, 13, 13, 13, 15
Frequency: 4→1, 7→2, 9→1, 11→1, 13→3, 15→1
Mode = 13 (appears 3 times)

A dataset can have more than one mode. If two values are tied for the top frequency, it's bimodal. Three or more and it's multimodal. If everything appears exactly once, there's no mode.

Bimodal example:
Dataset: 3, 5, 5, 8, 9, 9, 12
Mode = 5 and 9 (both appear twice)

The mode is the only average that works with categorical data, stuff that falls into named groups rather than numbers. The most popular item on a menu, the most common shoe size sold, the most frequent answer in a survey, those are all modal values. You can't really take a mean of shoe sizes and have it mean anything useful.

Choosing the Right Average

Which average you use depends on your data and what you're actually trying to find out.

SituationBest MeasureReason
Daily temperatures over a monthMeanSymmetrically distributed, no extreme outliers
House prices in a neighbourhoodMedianA few luxury properties distort the mean
Most common dress size soldModeCategorical; mean of sizes is not meaningful
Exam marks across a classMeanUseful when every mark contributes equally
Hospital waiting timesMedianA small number of very long waits skew the mean
Survey: favourite colourModeNon-numerical - only frequency applies
Wages in a companyMedianExecutive salaries create extreme outliers

So basically: mean when the data is balanced and there are no wild outliers, median when things are skewed or there's a value dragging it in one direction, and mode when you want the most common value or when your data isn't even numeric.

The Effect of Outliers on Mean vs. Median

An outlier is a value way outside the normal range of the dataset. The mean is very sensitive to outliers. The median barely cares. And that difference matters a lot in real life.

Take this example: weekly sales figures for a team of seven sales reps.

Sales: 41, 44, 47, 49, 52, 55, 198

Mean = (41 + 44 + 47 + 49 + 52 + 55 + 198) ÷ 7
= 486 ÷ 7 = 69.4 units

Median = middle value (4th of 7) = 49 units

That 198 figure, probably a one-off bulk order, drags the mean all the way up to 69.4. But five out of seven reps sold between 41 and 55 units. The mean of 69.4 doesn't represent what a normal week looks like for any of them. The median of 49 is honest. The mean isn't.

Now take that outlier out:

Without outlier: 41, 44, 47, 49, 52, 55
Mean = 288 ÷ 6 = 48 units
Median = (47 + 49) ÷ 2 = 48 units

Without the outlier, the mean and median are basically the same, which tells you the remaining data is pretty symmetrical. When mean and median are close, the mean is trustworthy. When they're far apart, go with the median and start looking for what's causing the gap.