Descriptive Statistics: Summarizing Data for Beginners

Introduction

Statistics often sounds intimidating, but at its core, it is about making sense of data. Imagine you have a spreadsheet full of numbers—test scores, sales data, or survey responses. Looking at raw numbers alone doesn’t tell you much. You need a way to summarize, describe, and simplify the data into something meaningful.

This is where descriptive statistics comes in. It is the branch of statistics that organizes and summarizes data in ways that are easy to understand. Unlike inferential statistics, which makes predictions or generalizations, descriptive statistics focuses only on what the data shows—nothing more, nothing less.

2. Measures of Central Tendency (Finding the Center of Data)

These measures tell us what a “typical” or “central” value looks like in a dataset.

Mean (Average)

Add all the values and divide by how many there are.

Example: 5 people’s monthly internet bills: 800, 900, 1000, 1100, 1200 BDT.
Mean = (800 + 900 + 1000 + 1100 + 1200) ÷ 5 = 1000 BDT

Practical Use: A company might calculate the average salary to compare with competitors.
Caution: If one person earns 50,000 BDT and others earn 1,000 each, the mean will give a misleading picture.

Median (Middle Value)

Arrange the data and find the middle value.

Example: Household monthly incomes: 8,000; 10,000; 12,000; 15,000; 100,000.
Median = 12,000 (the middle one)

Practical Use: Governments report median income instead of mean to reflect standard living conditions.

Mode (Most Frequent Value)

The value that occurs most often.

Example: Shoe sizes sold: 38, 39, 39, 40, 40, 40, 41.
Mode = 40

Practical Use: Shoe manufacturers use mode to decide which sizes to produce more of.

Summary:
- Use mean for balanced data.
- Use median when data has outliers.
- Use mode when identifying the most common category (e.g., favorite product, popular size).

3. Measures of Dispersion (Understanding the Spread of Data)

Dispersion shows how consistent or variable the data is.

Range

Highest – Lowest.
Example: Exam scores: 40 (lowest), 95 (highest) → Range = 55

Practical Use: Quick idea of score differences among students.

Variance

Average of squared deviations from the mean. High variance = data spread out, Low variance = data clustered near the mean.
Example: Machine outputs per day: 500, 700, 1200 → high variance indicates unstable production.

Standard Deviation (SD)

Square root of variance, in same unit as data.
Example:
Class A: 68, 70, 72, 74, 76 → SD small (consistent)
Class B: 40, 55, 72, 90, 100 → SD large (variable)

Practical Use: Businesses track SD to measure sales consistency.

Interquartile Range (IQR)

Spread of the middle 50% of data. Formula: Q3 – Q1.
Example: Household expenses: 5,000; 6,000; 7,000; 8,000; 9,000; 50,000.
Q1 = 6,000, Q3 = 9,000 → IQR = 3,000

Practical Use: Banks use IQR to assess “typical” expenses, ignoring extreme outliers.

Why Descriptive Statistics Matter for Non-Statisticians

  • Business: “What’s our average customer spend?”
  • Healthcare: “What’s the survival rate after treatment?”
  • Education: “What’s the most common grade students achieve?”
  • Everyday Life: “What’s the average rainfall in Dhaka in July?”

In short, descriptive statistics helps anyone understand patterns, trends, and summaries without drowning in raw data.

FAQs

Q1: Is descriptive statistics the same as inferential statistics?
A: No. Descriptive = summarizes what data shows. Inferential = makes predictions or generalizations.

Q2: Why is median sometimes better than mean?
A: When there are extreme values (outliers), mean gets distorted. Median stays robust.

Q3: Do I need software for descriptive statistics?
A: Not necessarily. Simple averages can be calculated in Excel or by hand. Tools like R, Python, and Power BI make it faster.

MCQ Practice

  • Which of the following is a measure of central tendency?
    a) Variance
    b) Mean
    c) Standard Deviation
    d) Range
    Answer: b) Mean
  • The middle value of an ordered dataset is called:
    a) Mean
    b) Median
    c) Mode
    d) Range
    Answer: b) Median
  • Standard deviation measures:
    a) The average value
    b) The most frequent value
    c) The spread of data around the mean
    d) The difference between max and min
    Answer: c) The spread of data around the mean
  • If a dataset is highly skewed by outliers, the best measure of central tendency is:
    a) Mean
    b) Median
    c) Mode
    d) Standard Deviation
    Answer: b) Median
  • Which chart is best to represent categorical data (like favorite color)?
    a) Histogram
    b) Bar chart
    c) Boxplot
    d) Line chart
    Answer: b) Bar chart