Hypothesis Testing | Module 6 | Applied Statistics Course
Applied Statistics  ›  Module 6

Hypothesis Testing

From data to evidence-based decisions — the complete guide for students, analysts, and researchers.

Introduction: Why Hypothesis Testing Matters

Every day, decisions are made based on data. A pharmaceutical company wants to know whether a new drug lowers blood pressure more effectively than an existing treatment. A bank wants to know whether a new credit-scoring model reduces loan defaults. A marketing team wants to know whether a new email subject line increases open rates. In every one of these cases, the answer cannot come from opinion or instinct — it must come from evidence.

Hypothesis testing is the statistical engine that converts data into evidence. It gives analysts, researchers, and decision-makers a structured, mathematically rigorous way to determine whether what they observe in a sample is real — or whether it could have occurred by chance.

Without hypothesis testing, data science and applied statistics would be little more than description. With it, we can infer truths about populations from samples, validate scientific claims, test business assumptions, and support regulatory decisions with defensible logic.

This module — Module 6 of the Applied Statistics course — is dedicated entirely to this foundational method. By the end, you will understand every component of a hypothesis test, know which test to choose for which situation, and be able to apply the process in finance, auditing, business, and research contexts.

Learning Objectives

After completing this module, you will be able to: (1) state null and alternative hypotheses correctly; (2) interpret p-values and significance levels; (3) distinguish Type I from Type II errors; (4) execute Z-tests, t-tests, and chi-square tests; and (5) apply hypothesis testing to real professional scenarios.

What is Hypothesis Testing?

Simple Definition

Hypothesis testing is a statistical procedure used to decide whether there is enough evidence in a sample of data to support a specific claim about a population.

Statistical Definition

Formally, hypothesis testing is an inferential statistical method that evaluates two competing statements about a population parameter — the null hypothesis and the alternative hypothesis — using sample data, a chosen significance level, and a calculated test statistic, in order to make a probabilistic decision.

Practical Meaning

Think of hypothesis testing as a courtroom trial for data. The null hypothesis is the "presumption of innocence" — we assume nothing has changed, nothing is different, no effect exists. The data is the evidence. The statistical test is the judge. If the evidence is strong enough (measured by the p-value), we reject the assumption of innocence and conclude that something real is happening.

Real-World Examples
  • Healthcare: Does a new vaccine reduce infection rates significantly compared to a placebo?
  • Finance: Does the average return of a portfolio significantly exceed the market benchmark?
  • Marketing: Does a new website design lead to a statistically higher conversion rate?
  • Auditing: Is the error rate in a batch of invoices within acceptable control limits?
  • Education: Does an online tutoring program significantly improve student test scores?

The Core Idea: Falsifiability

The philosophy behind hypothesis testing comes from the scientific method — specifically, the principle that you cannot definitively prove a theory, but you can attempt to disprove it. That is why we test the null hypothesis (the "no effect" claim) and ask: is our data inconsistent enough with this claim to reject it?

Null Hypothesis (H₀)

Definition

The null hypothesis (H₀) is the default assumption that there is no effect, no difference, no relationship, or no change. It is the statement we test and attempt to reject based on evidence from data.

Purpose of the Null Hypothesis

The null hypothesis serves as the baseline position. Just as a defendant is presumed innocent until proven guilty, a population parameter is assumed unchanged or unaffected until sample data provides sufficient evidence against that assumption. This framing protects us from making false claims based on random noise in the data.

Practical Examples of Null Hypotheses

DomainResearch QuestionNull Hypothesis (H₀)
FinanceDoes Strategy A outperform Strategy B?Mean return of A = Mean return of B
HealthcareDoes Drug X reduce blood pressure?Mean reduction with Drug X = Mean reduction with placebo
MarketingDoes the new ad campaign increase sales?Mean sales before = Mean sales after campaign
AuditingIs the error rate in compliance?Population error rate ≤ acceptable threshold (e.g., 5%)
EducationDoes online tutoring improve scores?Mean score with tutoring = Mean score without tutoring
ManufacturingDoes machine produce correct weight?Mean weight = target weight (e.g., 500g)

Notice that every null hypothesis contains an equality statement. This is a critical rule: the null hypothesis always includes an "=" sign (whether stated as =, ≤, or ≥). This allows us to specify a single reference value against which we calculate probabilities.

Alternative Hypothesis (H₁ or Hₐ)

Definition

The alternative hypothesis (H₁) is the claim we are trying to find evidence for. It represents the possibility of an effect, difference, or relationship. If data provides sufficient evidence against H₀, we accept H₁.

Types of Alternative Hypotheses

The alternative hypothesis can take three forms depending on the direction of your claim:

Test TypeSymbolMeaningExample
Two-tailedH₁: μ ≠ μ₀Parameter is different (either direction)New drug changes blood pressure (up or down)
Right-tailed (Upper)H₁: μ > μ₀Parameter is greater than assumed valueNew training increases productivity
Left-tailed (Lower)H₁: μ < μ₀Parameter is less than assumed valueNew process reduces defect rate

Why Direction Matters

The direction of your alternative hypothesis determines where the "rejection region" is placed in the distribution. A two-tailed test splits the rejection area between both tails. A one-tailed test concentrates it in one tail, making it more powerful for detecting effects in that specific direction — but incapable of detecting effects in the opposite direction.

Important Rule

You must specify your alternative hypothesis before collecting data. Choosing the direction after seeing results is called "p-hacking" and is a serious research integrity violation.

Null vs. Alternative Hypothesis: Comparison Table

FeatureNull Hypothesis (H₀)Alternative Hypothesis (H₁)
PurposeDefault assumption to test againstThe claim we seek evidence for
ContainsEquality (=, ≤, ≥)Inequality (≠, >, <)
Default positionWe assume it is true initiallyMust be supported by evidence
What we calculateProbability assuming H₀ is trueAccepted when H₀ is rejected
DecisionReject or fail to rejectAccept if H₀ is rejected
AnalogyDefendant presumed innocentProsecution's claim of guilt

Significance Level (Alpha, α)

Definition

The significance level (α) is the probability threshold used to decide whether to reject the null hypothesis. It represents the maximum probability of making a Type I error (rejecting a true null hypothesis) that the researcher is willing to accept.

Common Significance Levels

LevelValueTypical Use
α = 0.10 (10%)10% chance of Type I errorExploratory research, pilot studies
α = 0.05 (5%)5% chance of Type I errorStandard in social sciences, business, finance
α = 0.01 (1%)1% chance of Type I errorMedical trials, quality control, high-stakes decisions
α = 0.001 (0.1%)0.1% chance of Type I errorPhysics experiments (particle detection)

The 0.05 Rule Explained

The 5% significance level, introduced by statistician Ronald Fisher in the 1920s, has become the default standard in most fields. It means: "I am willing to accept a 5% chance of incorrectly rejecting the null hypothesis." In practical terms, if you run the same experiment 100 times under conditions where H₀ is actually true, you would expect to (incorrectly) reject it 5 times by chance alone.

The choice of α should reflect the cost of a wrong decision. In medical research or auditing, where false positives carry serious consequences, α = 0.01 is more appropriate. In exploratory business research where false positives are less costly, α = 0.10 may be acceptable.

Confidence Level

The confidence level is simply (1 − α). At α = 0.05, you have 95% confidence. At α = 0.01, you have 99% confidence. A higher confidence level requires stronger evidence to reject the null hypothesis.

The p-value

Definition

The p-value is the probability of observing a test statistic as extreme as — or more extreme than — the one calculated from the sample data, assuming the null hypothesis is true.

How to Interpret the p-value

The p-value answers this precise question: "If the null hypothesis were actually true, how likely would we be to see data this extreme just by chance?" A very small p-value means such extreme data is very unlikely under H₀ — suggesting H₀ is probably false.

p-valueInterpretationDecision
p < 0.001Extremely strong evidence against H₀Reject H₀
p < 0.01Very strong evidence against H₀Reject H₀
p < 0.05Sufficient evidence against H₀Reject H₀ (at α = 0.05)
0.05 ≤ p < 0.10Marginal / weak evidenceFail to reject H₀ (at α = 0.05)
p ≥ 0.10Insufficient evidence against H₀Fail to reject H₀

p < 0.05: What It Really Means

When p < 0.05, we say the result is statistically significant at the 5% level. This means: assuming H₀ is true, the probability of observing our sample result is less than 5%. Because this is so unlikely under H₀, we reject H₀ and accept H₁.

p > 0.05: What It Really Means

When p > 0.05, we fail to reject H₀. This does NOT mean H₀ is proven true. It simply means our data did not provide enough evidence to reject it. Absence of evidence is not evidence of absence.

Critical Misconception

The p-value is NOT the probability that the null hypothesis is true. It is the probability of observing your data (or more extreme data) IF the null hypothesis were true. This distinction is frequently confused even by professionals.

Worked Example

A bank claims its loan approval process takes an average of 48 hours (μ₀ = 48). You sample 40 applications and find a mean of 52 hours with a standard deviation of 10 hours. After running a Z-test, you get p = 0.012.

Interpretation: If the true mean were 48 hours, there is only a 1.2% probability of observing a sample mean of 52 hours or higher. Since 0.012 < 0.05, we reject H₀ and conclude that approval times have significantly exceeded 48 hours.

Type I Error and Type II Error

Type I Error (False Positive)

Definition

A Type I Error occurs when we reject a true null hypothesis. We conclude that an effect exists when, in reality, it does not. The probability of a Type I error is equal to the significance level (α).

Real-World Examples of Type I Errors:

  • Medicine: Approving a drug that is no more effective than a placebo because trial results appeared significant by chance.
  • Finance: Concluding that a trading strategy generates excess returns when its performance was due to random market fluctuations.
  • Auditing: Flagging a compliant company as non-compliant due to random sampling results.
  • Marketing: Rolling out a new campaign thinking it improved conversions when the increase was random noise.

Type II Error (False Negative)

Definition

A Type II Error occurs when we fail to reject a false null hypothesis. We conclude no effect exists when, in reality, one does. The probability of a Type II error is denoted by β (beta).

Real-World Examples of Type II Errors:

  • Medicine: Failing to approve an effective drug because the clinical trial sample was too small to detect the true benefit.
  • Finance: Dismissing an actually profitable trading strategy as statistically insignificant due to limited data.
  • Auditing: Clearing a non-compliant company because the sampled invoices happened to look clean.
  • Quality Control: Passing a defective production batch because the sample used did not reveal the defects.

Statistical Power

Power = 1 − β. Power is the probability of correctly rejecting a false null hypothesis. A well-designed study aims for power ≥ 0.80 (80%), meaning there is at least an 80% chance of detecting a real effect when it exists. Power is increased by using larger sample sizes or choosing more sensitive tests.

Type I vs. Type II Error: Comparison Table

FeatureType I ErrorType II Error
Also calledFalse PositiveFalse Negative
What happenedRejected a true H₀Failed to reject a false H₀
Probabilityα (significance level)β (beta)
Controlled byReducing αIncreasing sample size / power
Trade-offLowering α increases βLowering β increases α (if n is fixed)
Medical analogyDiagnosing a healthy person as sickMissing a disease in a sick person
Legal analogyConvicting an innocent personAcquitting a guilty person

The Type I / Type II Trade-Off

  • For a fixed sample size, reducing Type I error risk (lowering α) increases Type II error risk.
  • The solution is not to choose between them — it is to increase the sample size, which reduces both risks simultaneously.
  • Context determines which error is more costly. In drug approval, a Type I error (approving ineffective drugs) can be very harmful. In disease screening, a Type II error (missing a disease) can be fatal.

The Hypothesis Testing Process: 7 Steps

  1. 1
    State the Hypotheses Write both the null hypothesis (H₀) and the alternative hypothesis (H₁). Decide whether the test is one-tailed (directional) or two-tailed (non-directional). The hypotheses must be mutually exclusive and exhaustive.
  2. 2
    Choose the Significance Level (α) Select α before collecting data (typically 0.05). This sets the threshold for "sufficient evidence." Your choice should reflect the domain, the cost of errors, and any regulatory standards.
  3. 3
    Select the Appropriate Test and Collect Data Choose the correct statistical test (Z-test, t-test, chi-square, F-test, etc.) based on the data type, sample size, and what is being measured. Then collect or obtain your sample data.
  4. 4
    Check Assumptions Every test has assumptions (e.g., normality, independence, homogeneity of variance). Verify these before proceeding. Violating assumptions invalidates the test.
  5. 5
    Calculate the Test Statistic Apply the formula for your chosen test to compute the test statistic (Z, t, χ², etc.). This value measures how far your sample result lies from the null hypothesis value in units of standard error.
  6. 6
    Find the p-value and Compare to α Using the test statistic and its distribution, calculate the p-value. Compare it to your pre-set significance level: if p < α, reject H₀; if p ≥ α, fail to reject H₀.
  7. 7
    Draw a Conclusion in Context State your decision in plain language related to the original question. Do not just say "reject H₀" — explain what it means for the problem at hand. Report effect size and confidence intervals when relevant.

Z-Test

Definition

A Z-test is a statistical test used to determine whether the mean of a population significantly differs from a hypothesized value (or whether two population means differ), when the population standard deviation (σ) is known and the sample size is large (n ≥ 30).

When to Use a Z-Test

  • Population standard deviation (σ) is known
  • Sample size is large (n ≥ 30)
  • Data is approximately normally distributed (or n is large enough for CLT to apply)
  • Testing means or proportions

The Z-Test Formula

One-Sample Z-Test: Z = (x̄ − μ₀) / (σ / √n) Where: x̄ = sample mean μ₀ = hypothesized population mean σ = population standard deviation (known) n = sample size Two-Sample Z-Test: Z = (x̄₁ − x̄₂) / √(σ₁²/n₁ + σ₂²/n₂)

Critical Values for Z-Test

Test Typeα = 0.10α = 0.05α = 0.01
Two-tailed±1.645±1.96±2.576
Right-tailed+1.282+1.645+2.326
Left-tailed−1.282−1.645−2.326

Step-by-Step Worked Example: Z-Test

Business Example — Quality Control

Problem: A cola bottling company claims each bottle contains 500 ml (μ₀ = 500). The machine's standard deviation is known to be σ = 8 ml. A quality inspector randomly samples n = 36 bottles and finds a mean volume of x̄ = 503 ml. Test at α = 0.05 whether the machine is overfilling.

Step 1 — Hypotheses:

  • H₀: μ = 500 ml (machine fills correctly)
  • H₁: μ > 500 ml (machine is overfilling) — right-tailed test

Step 2 — Significance Level: α = 0.05. Critical value for right-tailed test: Z_critical = +1.645

Step 3 — Test Statistic:

Z = (503 − 500) / (8 / √36) Z = 3 / (8 / 6) Z = 3 / 1.333 Z = 2.25

Step 4 — Decision: Z = 2.25 > Z_critical = 1.645, so we reject H₀.

Alternatively: p-value = P(Z > 2.25) ≈ 0.012. Since 0.012 < 0.05, reject H₀.

Step 5 — Conclusion: At the 5% significance level, there is sufficient statistical evidence that the machine is overfilling bottles. The mean fill (503 ml) is significantly greater than the target (500 ml). Maintenance should be scheduled.

t-Test

Definition

A t-test is a statistical test used to compare means when the population standard deviation is unknown and must be estimated from the sample. The t-test is the workhorse of applied statistics for most real-world situations.

When to Use a t-Test

  • Population standard deviation is unknown (uses sample standard deviation, s)
  • Sample size is small (n < 30) OR large (still valid with small samples)
  • Population is approximately normally distributed

Three Types of t-Tests

1. One-Sample t-Test

Used to compare the mean of a single sample to a known or hypothesized population mean.

t = (x̄ − μ₀) / (s / √n) Degrees of freedom: df = n − 1
Finance Example

A fund claims its monthly returns average 2%. A sample of 15 months shows x̄ = 2.4% with s = 0.8%. Test at α = 0.05.

t = (2.4 − 2.0) / (0.8 / √15) t = 0.4 / 0.2066 t = 1.936

df = 14. t_critical (two-tailed, α=0.05, df=14) = ±2.145. Since |1.936| < 2.145, fail to reject H₀. There is insufficient evidence that the fund's returns differ significantly from 2%.

2. Independent Samples t-Test (Two-Sample t-Test)

Used to compare the means of two independent groups.

t = (x̄₁ − x̄₂) / √(s_p² × (1/n₁ + 1/n₂)) Where s_p² = pooled variance = [(n₁−1)s₁² + (n₂−1)s₂²] / (n₁+n₂−2) Degrees of freedom: df = n₁ + n₂ − 2
Business Example — Employee Training

Two groups of employees take a skills test: Group A (standard training, n=12, x̄=74, s=8) and Group B (new training, n=12, x̄=82, s=9). Did the new training make a significant difference?

After computing the pooled variance and t-statistic (t ≈ 2.41, df = 22), comparing to t_critical = 2.074 at α = 0.05 two-tailed → Reject H₀. New training leads to significantly higher scores.

3. Paired Samples t-Test

Used when the same subjects are measured twice (before/after), or when two measurements are naturally paired. Works on the differences (d = x_after − x_before).

t = d̄ / (s_d / √n) Where: d̄ = mean of differences s_d = standard deviation of differences n = number of pairs Degrees of freedom: df = n − 1
Audit Example — Pre/Post Training

An audit firm trains 10 staff and measures error rates before and after. Mean difference d̄ = −3.2 errors/week, s_d = 2.1, n = 10.

t = −3.2 / (2.1 / √10) = −3.2 / 0.664 = −4.82

df = 9. t_critical (left-tailed, α=0.05) = −1.833. Since −4.82 < −1.833, reject H₀. Training significantly reduced error rates.

Z-Test vs. t-Test: Comparison Table

FeatureZ-Testt-Test
σ (population SD)KnownUnknown (use s)
Typical sample sizen ≥ 30Any size (especially n < 30)
Distribution usedStandard Normal (Z)Student's t-distribution
Degrees of freedomNot applicabledf = n − 1 (or n₁+n₂−2)
Critical value (α=0.05, 2-tail)±1.96Varies by df (e.g., ±2.045 for df=30)
Real-world useLess common (σ rarely known)Most common in practice
Variance assumedKnown σ²Estimated from sample

Chi-Square Test (χ²)

Definition

The chi-square test (χ²) is a non-parametric statistical test used to analyze categorical data. It tests whether observed frequencies differ from expected frequencies, or whether two categorical variables are independent of each other.

When to Use Chi-Square

  • Data is categorical (nominal or ordinal)
  • You are counting frequencies or proportions
  • Testing independence between two variables, or whether observed data fits a distribution
  • Each cell in a contingency table should have expected frequency ≥ 5

The Chi-Square Formula

χ² = Σ [(O − E)² / E] Where: O = Observed frequency in each category E = Expected frequency under H₀ Σ = Sum over all categories or cells

Two Types of Chi-Square Tests

1. Chi-Square Goodness-of-Fit Test

Tests whether a sample distribution matches a hypothesized (expected) distribution. Uses one categorical variable.

Marketing Example

A store expects equal customer visits across 4 days (Monday–Thursday), with 25% each day. Observed: Mon=40, Tue=25, Wed=30, Thu=25 (n=120). Expected = 30 each.

χ² = (40−30)²/30 + (25−30)²/30 + (30−30)²/30 + (25−30)²/30 = 100/30 + 25/30 + 0/30 + 25/30 = 3.33 + 0.83 + 0 + 0.83 = 5.00

df = categories − 1 = 3. χ²_critical (α=0.05, df=3) = 7.815. Since 5.00 < 7.815, fail to reject H₀. Visitor distribution is consistent with expectations.

2. Chi-Square Test of Independence

Tests whether two categorical variables are related or independent using a contingency table. Uses two categorical variables.

Audit / HR Example

An auditor examines whether department (Sales vs. Finance) is independent of compliance status (Pass vs. Fail). A 2×2 contingency table is created with observed counts. Expected frequencies are calculated as: E = (Row Total × Column Total) / Grand Total. The chi-square statistic is then compared to the critical value with df = (rows − 1)(columns − 1) = 1.

If the result is significant, it means department and compliance status are not independent — compliance rates differ by department.

Degrees of Freedom for Chi-Square

Testdf Formula
Goodness-of-fitdf = k − 1 (where k = number of categories)
Test of Independencedf = (rows − 1)(columns − 1)

Choosing the Right Statistical Test

SituationData Typenσ Known?Recommended Test
Compare sample mean to known valueContinuous≥30YesZ-test
Compare sample mean to known valueContinuousAnyNoOne-sample t-test
Compare means of two independent groupsContinuousAnyNoIndependent t-test
Compare before/after measurements (same subjects)ContinuousAnyNoPaired t-test
Test if categorical data fits an expected distributionCategoricalN/AChi-square goodness-of-fit
Test relationship between two categorical variablesCategoricalN/AChi-square independence
Compare means of 3+ groupsContinuousAnyNoANOVA (F-test)
Relationship between two continuous variablesContinuousAnyN/APearson correlation / regression

A useful decision rule: if your outcome variable is numerical, use Z or t-tests; if it is categorical, use chi-square. If you are comparing more than two groups, consider ANOVA (Module 7).

Hypothesis Testing in Finance

Investment Performance Analysis

Portfolio managers use hypothesis testing to determine whether a fund's returns are statistically significantly different from a benchmark. A one-sample t-test can test whether mean excess returns (alpha) are significantly above zero. Without statistical testing, a fund with a few good years might appear outperforming when its results are indistinguishable from random variation.

Risk Assessment

Finance professionals test whether the variance (or Value-at-Risk) of a portfolio falls within acceptable parameters. They also use hypothesis tests to compare the risk profiles of different asset classes or to validate risk models against historical data.

Market Research and Pricing

Two-sample t-tests are used to compare market prices or transaction volumes between two periods (e.g., pre- and post-policy change). Chi-square tests are used to analyze whether investor sentiment (positive, neutral, negative) is independent of market sector.

Event Studies

Finance researchers use t-tests to detect abnormal stock returns around specific events (earnings announcements, mergers, regulatory changes). A statistically significant abnormal return implies the market incorporated new information.

Hypothesis Testing in Auditing

Compliance Testing

Auditors use hypothesis testing to determine whether an organization's control error rate exceeds the tolerable deviation rate. A one-sample proportion Z-test can assess: H₀: population error rate ≤ 5% (acceptable). If the sample error rate is significantly above 5%, the auditor concludes internal controls are inadequate.

Audit Sampling

Rather than examining every transaction, auditors sample and use hypothesis tests to draw conclusions about the entire population of transactions. The tests determine whether the observed sample errors are consistent with an acceptable population error rate.

Fraud Investigation

Statistical anomalies detected through hypothesis testing can be indicators of fraud. For example, testing whether the frequency distribution of leading digits in financial data follows Benford's Law (using chi-square goodness-of-fit) is a known fraud-detection technique. Significant deviation from the expected distribution may signal fabricated entries.

Hypothesis Testing in Business

A/B Testing for Products and Marketing

Businesses run A/B tests to compare two versions of a product, webpage, or advertisement. A two-sample Z-test or t-test determines whether the observed difference in conversion rates between Version A and Version B is statistically significant — or merely due to chance. A/B testing is the direct industrial application of hypothesis testing.

Customer Satisfaction Analysis

Companies use t-tests to compare satisfaction scores before and after a service change. Chi-square tests assess whether customer satisfaction ratings (Satisfied / Neutral / Dissatisfied) are independent of customer segment (new vs. returning).

Quality Control and Six Sigma

Manufacturing operations use Z-tests and t-tests to determine whether production processes meet specifications. Six Sigma methodology is entirely built on hypothesis testing — processes are improved until the defect rate is reduced to statistically near-zero levels.

Common Hypothesis Testing Mistakes (and How to Avoid Them)

  1. Misinterpreting the p-value. The p-value is not the probability that H₀ is true — it is the probability of obtaining data this extreme if H₀ were true. Solution: Always state p-value interpretations correctly.
  2. Equating "fail to reject H₀" with "accept H₀." Not finding evidence against H₀ does not prove it is true. Solution: Use the phrase "fail to reject" rather than "accept."
  3. Confusing statistical significance with practical significance. A statistically significant result (p < 0.05) may have a trivially small effect size. Solution: Always report effect sizes (Cohen's d, eta-squared, etc.) alongside p-values.
  4. Choosing the wrong test. Using a Z-test when σ is unknown, or using t-tests for categorical data. Solution: Follow the test selection decision table.
  5. Ignoring test assumptions. Running a t-test on severely non-normal data with small samples. Solution: Always check normality, independence, and variance homogeneity assumptions.
  6. Using a one-tailed test when a two-tailed is appropriate. One-tailed tests are more powerful but only valid when the direction of difference is specified in advance. Solution: Default to two-tailed unless there is a strong theoretical justification.
  7. P-hacking (data dredging). Running multiple tests and only reporting the significant ones inflates the Type I error rate. Solution: Pre-register hypotheses; apply Bonferroni correction for multiple tests.
  8. Setting α after seeing the data. Choosing your significance level after peeking at the results. Solution: Always set α before data collection.
  9. Insufficient sample size. Underpowered studies (small n) miss real effects. Solution: Conduct power analysis before data collection to determine adequate n.
  10. Assuming causation from correlation. A statistically significant relationship does not imply that one variable causes another. Solution: Apply causal reasoning frameworks; use controlled experiments.
  11. Using mean comparisons on ordinal data. Running a t-test on Likert scale data (1–5 ratings). Solution: Use non-parametric tests (Mann-Whitney, Wilcoxon) for ordinal data.
  12. Ignoring outliers. Extreme values can inflate variance and distort test statistics. Solution: Identify and assess outliers; use robust statistical methods when necessary.
  13. Neglecting the effect of multiple comparisons. Testing 20 variables at α = 0.05 means at least one false positive is expected by chance. Solution: Apply false discovery rate (FDR) corrections in multiple-testing contexts.
  14. Over-relying on hypothesis tests without confidence intervals. A confidence interval conveys both significance and the size/direction of effects. Solution: Always accompany hypothesis test results with confidence intervals.
  15. Not reporting null results. Publication bias toward significant results distorts the scientific record. Solution: Report all tests, including non-significant findings, with context.

Practical Case Study: Retail Bank Loan Processing

Full Case Study

Business Problem: FastTrack Bank's management believes its digital loan processing system has reduced the average approval time from the historical average of 72 hours. A quality team samples 25 recent applications and records the following approval times (in hours):

60, 65, 68, 70, 71, 63, 75, 72, 69, 67, 64, 66, 70, 73, 61, 68, 74, 62, 65, 69, 70, 63, 67, 71, 68

Step 1 — Hypotheses:

  • H₀: μ ≥ 72 (processing time has not decreased)
  • H₁: μ < 72 (processing time has decreased) — left-tailed test

Step 2 — Significance Level: α = 0.05

Step 3 — Test Selection: One-sample t-test (σ unknown, n = 25)

Step 4 — Sample Statistics:

Sample mean (x̄) = 67.64 hours Sample SD (s) = 3.89 hours n = 25

Step 5 — Calculate Test Statistic:

t = (x̄ − μ₀) / (s / √n) t = (67.64 − 72) / (3.89 / √25) t = (−4.36) / (3.89 / 5) t = (−4.36) / 0.778 t = −5.604

Step 6 — Critical Value and Decision:
df = 24. t_critical (left-tailed, α = 0.05) = −1.711.
Since −5.604 < −1.711, we reject H₀.
p-value < 0.0001.

Step 7 — Conclusion: At the 5% significance level, there is very strong statistical evidence that FastTrack Bank's digital loan processing system has significantly reduced the average approval time from 72 hours. The sample mean of 67.64 hours represents an estimated reduction of approximately 4.36 hours. Management's decision to invest in the digital system is statistically justified.

Business Recommendation: FastTrack should continue and scale the digital processing system. Future analysis should examine whether the reduction is consistent across all loan types and customer segments.

Key Takeaways

Module 6 — Core Concepts to Remember

  • Hypothesis testing is a framework for making evidence-based decisions about population parameters using sample data.
  • The null hypothesis (H₀) represents the "no effect / no change" position; the alternative hypothesis (H₁) represents what we are trying to prove.
  • The significance level (α) is the threshold for rejecting H₀. The most common value is α = 0.05 (5%).
  • The p-value is the probability of obtaining results as extreme as observed, assuming H₀ is true. A small p-value (p < α) leads to rejection of H₀.
  • A Type I Error (false positive, probability = α) is rejecting a true H₀. A Type II Error (false negative, probability = β) is failing to reject a false H₀.
  • Use a Z-test when σ is known and n ≥ 30. Use a t-test when σ is unknown.
  • Use the chi-square test for categorical data — either to test goodness-of-fit or independence between two categorical variables.
  • Statistical significance does not equal practical importance. Always assess effect size.
  • The 7-step process: state hypotheses → choose α → select test → check assumptions → calculate test statistic → compare to critical value/p-value → draw conclusion.

Practice Exercises

Part A: 20 Conceptual Questions

  1. In your own words, explain the difference between a null hypothesis and an alternative hypothesis.
  2. Why is the null hypothesis always stated with an equality?
  3. What does a p-value of 0.03 mean in the context of a hypothesis test?
  4. A researcher finds p = 0.08 at α = 0.05. What is the decision, and what does it mean?
  5. Explain the difference between Type I and Type II errors using a medical testing analogy.
  6. Why can lowering α increase the risk of Type II error (for fixed n)?
  7. What is statistical power, and why does it matter?
  8. When would you use a one-tailed test instead of a two-tailed test?
  9. Explain why a statistically significant result is not necessarily practically meaningful.
  10. Why should you never change your significance level after seeing the data?
  11. What is the difference between a Z-test and a t-test? When is each appropriate?
  12. What are the assumptions of the independent samples t-test?
  13. Explain the chi-square test of independence. What is the null hypothesis in such a test?
  14. A researcher runs 20 hypothesis tests at α = 0.05. How many false positives should they expect by chance, even if all null hypotheses are true?
  15. What is the Bonferroni correction and when would you apply it?
  16. In auditing, which error type is potentially more dangerous: Type I or Type II? Explain your reasoning.
  17. What does "95% confidence level" mean in the context of hypothesis testing?
  18. Why is it incorrect to say "we accept the null hypothesis"?
  19. How does increasing the sample size affect the p-value? Why?
  20. What is "p-hacking," and why is it a serious problem in research?

Part B: 10 Numerical Problems (with Solutions)

Problem 1. A machine fills bags with 1 kg of flour (μ₀ = 1000g, σ = 20g). A sample of 49 bags gives x̄ = 994g. Test at α = 0.05 (two-tailed) whether the machine is working correctly.

Solution
Z = (994 − 1000) / (20/√49) = −6/2.857 = −2.10. Critical values: ±1.96. Since |−2.10| > 1.96, reject H₀. The machine is not filling correctly.

Problem 2. A tire company claims mean tire life = 50,000 km. A sample of 16 tires has x̄ = 48,500 km, s = 3,200 km. Test at α = 0.05 (left-tailed).

Solution
t = (48500 − 50000)/(3200/√16) = −1500/800 = −1.875. df = 15. t_crit = −1.753. Since −1.875 < −1.753, reject H₀. Tire life is significantly less than claimed.

Problem 3. Two sales teams: Team A (n=15, x̄=85K, s=12K) vs. Team B (n=15, x̄=80K, s=10K). Test at α = 0.05 if means differ significantly.

Solution
s_p² = [(14×144 + 14×100)/28] = 122. t = (85K−80K)/√(122×(2/15)) = 5K/4.035 = 1.239. df=28. t_crit=±2.048. Fail to reject H₀. No significant difference.

Problem 4. Loan approvals before/after policy change (10 pairs). d̄ = 2.3 days, s_d = 1.8 days. Test at α = 0.05 if the policy reduced time.

Solution
t = 2.3/(1.8/√10) = 2.3/0.569 = 4.04. df=9. t_crit (right-tailed) = 1.833. Since 4.04 > 1.833, reject H₀. Policy significantly reduced approval time.

Problem 5. A die is rolled 60 times. Each face should appear 10 times. Observed: 8, 9, 12, 11, 11, 9. Test fairness using chi-square at α = 0.05.

Solution
χ² = (8−10)²/10 + (9−10)²/10 + (12−10)²/10 + (11−10)²/10 + (11−10)²/10 + (9−10)²/10 = 0.4+0.1+0.4+0.1+0.1+0.1 = 1.2. df=5. χ²_crit=11.07. Since 1.2 < 11.07, fail to reject. Die is fair.

Problems 6–10: Available in the module workbook (extended numerical problems covering two-proportion Z-tests, chi-square independence tests, power analysis, ANOVA preview, and finance applications).

Multiple Choice Quiz: 30 Questions

Question 1 of 30
What is the primary purpose of the null hypothesis?
  • A) To prove the researcher's theory
  • B) To represent the default assumption of no effect or no change
  • C) To identify outliers in the data
  • D) To maximize the p-value
✓ Correct: B — The null hypothesis represents the default position: no effect, no difference, no change. It is the baseline we test against.
Question 2 of 30
A p-value of 0.03 at a significance level of 0.05 means:
  • A) There is a 3% probability the null hypothesis is true
  • B) There is a 97% probability the alternative hypothesis is true
  • C) There is sufficient evidence to reject the null hypothesis
  • D) The result is not statistically significant
✓ Correct: C — Since p = 0.03 < α = 0.05, we have sufficient evidence to reject H₀. Note: the p-value is NOT the probability that H₀ is true.
Question 3 of 30
Which error type involves rejecting a null hypothesis that is actually true?
  • A) Type II Error
  • B) Beta Error
  • C) Type I Error
  • D) Power Error
✓ Correct: C — A Type I Error (false positive) occurs when we reject a true null hypothesis. Its probability equals α.
Question 4 of 30
When is a Z-test preferred over a t-test?
  • A) When the sample size is less than 30
  • B) When the population standard deviation is unknown
  • C) When the population standard deviation is known and n ≥ 30
  • D) When comparing categorical variables
✓ Correct: C — Z-tests require a known population standard deviation and are most appropriate for large samples (n ≥ 30).
Question 5 of 30
Statistical power is defined as:
  • A) 1 − α
  • B) 1 − β
  • C) α + β
  • D) α / β
✓ Correct: B — Power = 1 − β. It is the probability of correctly rejecting a false null hypothesis.
Question 6 of 30
A researcher wants to test whether two independent groups differ in exam performance. Which test is most appropriate?
  • A) Paired t-test
  • B) Chi-square test
  • C) Independent samples t-test
  • D) Goodness-of-fit test
✓ Correct: C — The independent samples t-test compares the means of two unrelated groups on a continuous outcome variable.
Question 7 of 30
A chi-square test of independence tests whether:
  • A) A sample mean equals a hypothesized value
  • B) Two categorical variables are related or independent
  • C) A dataset is normally distributed
  • D) Two group variances are equal
✓ Correct: B — The chi-square test of independence assesses whether two categorical variables are statistically associated or independent.
Question 8 of 30
If you "fail to reject H₀," this means:
  • A) H₀ is proven true
  • B) H₁ is proven false
  • C) The data provides insufficient evidence against H₀
  • D) A Type II error has been made
✓ Correct: C — Failing to reject H₀ means the data does not provide sufficient evidence to conclude H₀ is false. It does not prove H₀ is true.
Question 9 of 30
An auditor runs a chi-square test and observes χ² = 8.2 with 3 degrees of freedom at α = 0.05 (critical value = 7.815). What is the correct conclusion?
  • A) Fail to reject H₀ — the data fits expectations
  • B) Reject H₀ — the observed and expected frequencies differ significantly
  • C) Accept H₁ with 95% certainty
  • D) The test is inconclusive
✓ Correct: B — Since 8.2 > 7.815 (critical value), we reject H₀ and conclude there is a statistically significant difference between observed and expected frequencies.
Question 10 of 30
The significance level (α) directly controls the probability of which error?
  • A) Type II Error
  • B) Both Type I and Type II Errors equally
  • C) Type I Error
  • D) Sampling Error
✓ Correct: C — The significance level (α) directly sets the maximum acceptable probability of a Type I Error (false positive).
Question 11 of 30
Which of the following is the correct formula for a one-sample Z-test?
  • A) Z = (x̄ − μ₀) / (s / √n)
  • B) Z = (x̄ − μ₀) / (σ / √n)
  • C) Z = (x̄ − μ₀) × σ
  • D) Z = n × (x̄ − μ₀)
✓ Correct: B — The Z-test formula uses σ (population standard deviation), not s (sample standard deviation). If σ is unknown, use the t-test.
Question 12 of 30
A paired t-test is appropriate when:
  • A) Two independent groups are compared
  • B) The same subjects are measured twice (before and after)
  • C) The population standard deviation is known
  • D) The data is categorical
✓ Correct: B — The paired t-test is used when measurements are naturally linked (e.g., before/after on the same subjects), analyzing the differences.
Question 13 of 30
For a two-tailed Z-test at α = 0.05, what are the critical values?
  • A) ±1.282
  • B) ±1.645
  • C) ±1.960
  • D) ±2.576
✓ Correct: C — The critical values for a two-tailed Z-test at α = 0.05 are ±1.96. These represent the boundaries of the 95% confidence region.
Question 14 of 30
Running multiple hypothesis tests on the same dataset without correction increases the risk of:
  • A) Type II Errors only
  • B) Type I Errors (false positives)
  • C) Reducing statistical power
  • D) Increasing the p-value
✓ Correct: B — Multiple testing inflates the family-wise Type I error rate. With 20 tests at α = 0.05, you expect one false positive by chance.
Question 15 of 30
Benford's Law applied to financial data in fraud detection is typically tested using:
  • A) Paired t-test
  • B) Z-test
  • C) Chi-square goodness-of-fit test
  • D) F-test
✓ Correct: C — Chi-square goodness-of-fit is used to test whether observed digit frequencies in financial data conform to the expected Benford's Law distribution.
Question 16 of 30
An alternative hypothesis of H₁: μ > μ₀ indicates:
  • A) A two-tailed test
  • B) A left-tailed test
  • C) A right-tailed test
  • D) A non-directional test
✓ Correct: C — When H₁ states μ > μ₀ (greater than), the rejection region is in the right tail of the distribution — a right-tailed (upper) test.
Question 17 of 30
Which statement about the p-value is CORRECT?
  • A) A small p-value proves H₁ is true
  • B) The p-value is the probability that H₀ is true
  • C) The p-value is the probability of observing results this extreme if H₀ were true
  • D) A p-value must always be greater than α to be meaningful
✓ Correct: C — The p-value is calculated assuming H₀ is true and measures how likely the observed data (or more extreme) would be in that scenario.
Question 18 of 30
What degrees of freedom does a one-sample t-test use with n = 20 observations?
  • A) 20
  • B) 21
  • C) 19
  • D) 18
✓ Correct: C — For a one-sample t-test, df = n − 1 = 20 − 1 = 19.
Question 19 of 30
A marketer tests a new ad and concludes it increased conversions when it actually didn't. This is:
  • A) A correct rejection of H₀
  • B) A Type II Error
  • C) A Type I Error
  • D) A power failure
✓ Correct: C — Concluding there is an effect (increased conversions) when there actually isn't is a Type I Error (false positive).
Question 20 of 30
For a chi-square test of independence with a 3×4 contingency table, the degrees of freedom are:
  • A) 12
  • B) 11
  • C) 6
  • D) 7
✓ Correct: C — df = (rows − 1)(columns − 1) = (3−1)(4−1) = 2 × 3 = 6.
Question 21 of 30
Which of the following best increases statistical power?
  • A) Decreasing the significance level (α)
  • B) Decreasing the sample size
  • C) Increasing the sample size
  • D) Using a two-tailed test instead of one-tailed
✓ Correct: C — Larger sample sizes reduce standard error, making it easier to detect true effects. This increases power (1 − β).
Question 22 of 30
In finance, hypothesis testing is used to assess whether a fund's excess returns are statistically significant. This is an application of:
  • A) Chi-square goodness-of-fit
  • B) One-sample or two-sample t-test
  • C) Paired t-test only
  • D) Non-parametric testing only
✓ Correct: B — Finance professionals typically use one-sample t-tests (to test if mean alpha > 0) or two-sample t-tests (to compare two strategies).
Question 23 of 30
The chi-square formula is χ² = Σ [(O − E)² / E]. What do O and E represent?
  • A) Optimal and Estimated values
  • B) Observed and Expected frequencies
  • C) Overall and Extreme values
  • D) Output and Error values
✓ Correct: B — O = Observed frequency (from your data) and E = Expected frequency (calculated under the null hypothesis).
Question 24 of 30
If you want to test whether customer satisfaction level (Low/Medium/High) is related to age group (18–30 / 31–50 / 51+), which test is appropriate?
  • A) One-sample t-test
  • B) Z-test
  • C) Chi-square test of independence
  • D) Paired t-test
✓ Correct: C — Both variables (satisfaction level and age group) are categorical. The chi-square test of independence tests whether they are associated.
Question 25 of 30
A quality control engineer tests whether the mean weight of packages equals 500g. Which test should she use if σ is unknown and n = 20?
  • A) Z-test
  • B) One-sample t-test
  • C) Chi-square test
  • D) ANOVA
✓ Correct: B — With unknown σ and small n, the one-sample t-test is appropriate for comparing a sample mean to a known target value.
Question 26 of 30
The confidence level corresponding to a significance level of α = 0.01 is:
  • A) 90%
  • B) 95%
  • C) 98%
  • D) 99%
✓ Correct: D — Confidence level = (1 − α) × 100% = (1 − 0.01) × 100% = 99%.
Question 27 of 30
Which of the following is NOT an assumption of the one-sample t-test?
  • A) The sample is randomly drawn
  • B) The data is approximately normally distributed (or n is large)
  • C) The population standard deviation is known
  • D) Observations are independent of each other
✓ Correct: C — The t-test is specifically used when σ is UNKNOWN. If σ is known, the Z-test is appropriate.
Question 28 of 30
A bank samples 50 loan applications and finds a 6% error rate. The tolerable error rate is 5%. After testing at α = 0.05, p = 0.07. The auditor should:
  • A) Reject H₀ — the error rate is unacceptable
  • B) Fail to reject H₀ — insufficient evidence to conclude error rate exceeds 5%
  • C) Accept that error rate = exactly 5%
  • D) Repeat the test with α = 0.10
✓ Correct: B — Since p = 0.07 > α = 0.05, we fail to reject H₀. There is insufficient evidence at the 5% level to conclude the error rate exceeds the tolerable threshold.
Question 29 of 30
What is the primary reason we can never "prove" the null hypothesis through hypothesis testing?
  • A) The null hypothesis is always false
  • B) A lack of evidence against H₀ doesn't rule out undetected effects, especially with small samples
  • C) Statistical tests have no mathematical basis
  • D) The p-value always equals zero
✓ Correct: B — Failure to reject H₀ could result from the effect being genuinely absent OR from insufficient statistical power. We can only conclude insufficient evidence, not proof.
Question 30 of 30
Which step in the hypothesis testing process should always occur BEFORE data collection?
  • A) Calculating the test statistic
  • B) Determining the p-value
  • C) Stating the hypotheses and choosing the significance level
  • D) Drawing the conclusion
✓ Correct: C — Hypotheses and the significance level must be established before data collection to prevent bias, p-hacking, and post-hoc rationalization of results.

Frequently Asked Questions (FAQs)

1. What is hypothesis testing?
Hypothesis testing is a statistical method used to make data-driven decisions about population parameters. It involves stating a null hypothesis (no effect) and an alternative hypothesis (there is an effect), collecting sample data, calculating a test statistic, and using a p-value to decide whether to reject the null hypothesis.
2. What is a null hypothesis?
The null hypothesis (H₀) is the default claim that there is no effect, no difference, or no relationship between variables. It always contains an equality and represents the "status quo" that must be disproven by evidence.
3. What is an alternative hypothesis?
The alternative hypothesis (H₁) is the claim the researcher wants to support. It asserts that there is an effect, a difference, or a relationship. It is accepted when there is sufficient evidence to reject the null hypothesis.
4. What is a p-value?
The p-value is the probability of observing a test result as extreme as the actual result, assuming the null hypothesis is true. A smaller p-value provides stronger evidence against the null hypothesis. It is NOT the probability that the null hypothesis is true.
5. What does p < 0.05 mean?
A p-value less than 0.05 means that, if the null hypothesis were true, there would be less than a 5% probability of observing data as extreme as your sample. This is considered statistically significant, and we reject H₀ at the 5% significance level.
6. What is statistical significance?
A result is statistically significant when the p-value is less than the chosen significance level (α), meaning the evidence against the null hypothesis is strong enough to reject it. Statistical significance does not necessarily imply practical or real-world importance.
7. When should I use a Z-test?
Use a Z-test when the population standard deviation (σ) is known, the sample size is large (n ≥ 30), and the data is approximately normally distributed. In most real-world applications, σ is unknown, so the t-test is more common.
8. When should I use a t-test?
Use a t-test when the population standard deviation is unknown and must be estimated from the sample. The t-test is appropriate for any sample size, though it is especially important for small samples (n < 30) where the Z approximation breaks down.
9. What is the difference between a one-tailed and two-tailed test?
A two-tailed test tests whether the parameter is different from the null value in either direction (H₁: μ ≠ μ₀). A one-tailed test tests for a difference in a specific direction — either greater than (right-tailed) or less than (left-tailed). One-tailed tests are more powerful for detecting effects in the specified direction.
10. What is a Type I Error?
A Type I Error occurs when we incorrectly reject a true null hypothesis — a false positive. The probability of a Type I Error is equal to the significance level (α). Example: concluding a drug is effective when it actually isn't.
11. What is a Type II Error?
A Type II Error occurs when we fail to reject a false null hypothesis — a false negative. Its probability is denoted β. Example: concluding a drug is not effective when it actually is. Type II errors are reduced by increasing sample size.
12. What is a chi-square test?
A chi-square test is a non-parametric statistical test for categorical data. It comes in two forms: the goodness-of-fit test (does observed data match an expected distribution?) and the test of independence (are two categorical variables related?).
13. How is hypothesis testing used in finance?
In finance, hypothesis testing is used to assess investment performance (do returns significantly exceed a benchmark?), validate risk models, detect market anomalies, and analyze financial events. T-tests are commonly used to test whether portfolio alpha is significantly different from zero.
14. How is hypothesis testing used in auditing?
Auditors use hypothesis testing in statistical audit sampling to determine whether error rates exceed tolerable thresholds, in compliance testing, and in fraud detection (e.g., Benford's Law chi-square analysis). It allows auditors to draw defensible conclusions about entire populations from sample data.
15. How is hypothesis testing used in business?
Businesses use hypothesis testing in A/B testing (comparing two product versions or marketing campaigns), quality control, customer satisfaction analysis, and operations management. It provides a rigorous basis for making data-driven business decisions.
16. What is the significance level (alpha)?
The significance level (α) is the threshold probability at which we decide to reject the null hypothesis. Common values are 0.05 (5%), 0.01 (1%), and 0.10 (10%). It represents the maximum acceptable probability of making a Type I Error.
17. What is statistical power?
Statistical power (1 − β) is the probability that a hypothesis test correctly detects a real effect. High power (≥ 0.80 is a common standard) reduces the risk of a Type II Error. Power is increased by using larger samples, higher α levels, or more sensitive tests.
18. Can a hypothesis test prove something with certainty?
No. Hypothesis testing provides probabilistic evidence, not absolute proof. A significant result means we have strong evidence against the null hypothesis, but it does not prove the alternative is definitively true. Statistics deals in probabilities, not certainties.
19. What is p-hacking?
P-hacking refers to manipulating data analysis (running many tests, removing data points, changing hypotheses) until a statistically significant p-value is obtained. It is a serious form of research misconduct that inflates false positive rates.
20. What are the degrees of freedom in a t-test?
For a one-sample t-test: df = n − 1. For an independent samples t-test: df = n₁ + n₂ − 2. For a paired t-test: df = n − 1 (where n is the number of pairs). Degrees of freedom determine the shape of the t-distribution used to find the critical value.
21. What is the difference between one-sample and two-sample tests?
A one-sample test compares a single sample mean (or proportion) to a known or hypothesized value. A two-sample test compares the means (or proportions) of two separate groups to see if they differ from each other.
22. What is the Central Limit Theorem's role in hypothesis testing?
The Central Limit Theorem (CLT) states that sample means from sufficiently large samples (n ≥ 30) are approximately normally distributed, regardless of the population distribution. This is why Z-tests and t-tests are valid for large samples even when the population is not normal.
23. What is the Bonferroni correction?
The Bonferroni correction is a method to control the Type I error rate when conducting multiple hypothesis tests simultaneously. The adjusted significance level for each test is α_adjusted = α / number of tests. For example, with 5 tests at α = 0.05, each test uses α = 0.01.
24. Does statistical significance mean practical importance?
No. With large samples, even trivially small differences can become statistically significant. Practical importance is measured by effect size (e.g., Cohen's d). A statistically significant result with a tiny effect size may have no meaningful real-world implication.
25. What is the goodness-of-fit test?
A chi-square goodness-of-fit test compares observed frequencies in a dataset to expected frequencies under a theoretical distribution (e.g., uniform, normal, or custom). It tests whether the data "fits" the expected distribution well enough to be consistent with it.
26. What sample size is needed for a hypothesis test?
The required sample size depends on: the desired power (typically 0.80), the significance level (α), and the minimum effect size you want to detect. Larger effects require smaller samples; detecting small effects requires much larger samples. Use power analysis formulas or software to calculate required n.
27. What is a non-parametric test?
Non-parametric tests do not assume a specific population distribution (e.g., normality). The chi-square test is non-parametric. Other examples include the Mann-Whitney U test (alternative to independent t-test) and the Wilcoxon signed-rank test (alternative to paired t-test). They are used when parametric assumptions are violated.
28. When should you use a paired t-test versus an independent t-test?
Use a paired t-test when measurements are matched or related (same person measured twice, or matched pairs). Use an independent t-test when two completely separate groups are compared. Using the wrong test leads to incorrect results.
29. What is inferential statistics?
Inferential statistics is the branch of statistics that uses sample data to draw conclusions (make inferences) about a larger population. Hypothesis testing is a core technique within inferential statistics, alongside confidence intervals and estimation.
30. What software can I use to perform hypothesis tests?
Common tools include: Microsoft Excel (using Data Analysis ToolPak), R (built-in t.test, chisq.test functions), Python (scipy.stats library), SPSS, Stata, and SAS. For business and audit contexts, Excel and specialized audit software like ACL/Galvanize are widely used.

Final Module Summary

Module 6 has taken you through one of the most important analytical frameworks in applied statistics: hypothesis testing. Here is a complete recap of everything covered:

Foundations: Hypothesis testing gives analysts a principled, mathematical way to determine whether sample evidence is strong enough to support a claim about a population. The framework begins with two competing hypotheses: the null hypothesis (H₀) — the default claim of no effect — and the alternative hypothesis (H₁) — the claim we wish to support.

Decision Rules: The significance level (α) is the threshold for our decision. The p-value — the probability of observing data this extreme assuming H₀ is true — is compared to α. When p < α, we reject H₀ and conclude results are statistically significant.

Errors: Two types of mistakes are possible. A Type I Error (false positive, probability = α) means rejecting a true H₀. A Type II Error (false negative, probability = β) means failing to reject a false H₀. Statistical power (1 − β) measures our ability to detect real effects.

Tests Covered:

  • Z-Test: For large samples with known population standard deviation. Uses the standard normal distribution.
  • t-Test (one-sample, independent, paired): For unknown σ. The most widely used test in practice. Uses the t-distribution with appropriate degrees of freedom.
  • Chi-Square Test (goodness-of-fit and independence): For categorical data. Tests whether observed frequencies match expected frequencies or whether two categorical variables are related.

Applications: Hypothesis testing drives decision-making across finance (investment analysis, event studies), auditing (compliance testing, fraud detection), and business (A/B testing, quality control, customer analysis).

Best Practices: Always set hypotheses and α before data collection. Report effect sizes alongside p-values. Verify test assumptions. Avoid p-hacking and multiple testing inflation. Distinguish statistical significance from practical importance.

Module Completion

You have now completed Module 6: Hypothesis Testing. You are equipped to design, execute, and interpret hypothesis tests in academic and professional contexts.

Continue Learning

Module 7: Regression and Applications

Next, you will learn how to model relationships between variables using simple and multiple linear regression — essential for forecasting, financial modeling, and research analysis.

SEO Metadata for Blogger

SEO Title:
Hypothesis Testing — Complete Guide | Module 6 Applied Statistics
Meta Description:
Master hypothesis testing with this complete Module 6 guide. Learn null hypothesis, p-value, Z-test, t-test, chi-square, Type I & II errors with worked examples, 30 MCQs, and 30 FAQs. For students, analysts, and professionals.
URL Slug:
/applied-statistics/module-6-hypothesis-testing
Focus Keyword:
hypothesis testing
Secondary Keywords:
null hypothesis, alternative hypothesis, p-value, z-test, t-test, chi-square test, type I error, type II error, statistical significance, inferential statistics, applied statistics
Schema Type:
Course, FAQPage, EducationalOccupationalCredential

Applied Statistics Course  |  Module 6  |  Hypothesis Testing
© Applied Statistics Course. Educational content for students, researchers, and professionals.