Introduction: Why Hypothesis Testing Matters
Every day, decisions are made based on data. A pharmaceutical company wants to know whether a new drug lowers blood pressure more effectively than an existing treatment. A bank wants to know whether a new credit-scoring model reduces loan defaults. A marketing team wants to know whether a new email subject line increases open rates. In every one of these cases, the answer cannot come from opinion or instinct — it must come from evidence.
Hypothesis testing is the statistical engine that converts data into evidence. It gives analysts, researchers, and decision-makers a structured, mathematically rigorous way to determine whether what they observe in a sample is real — or whether it could have occurred by chance.
Without hypothesis testing, data science and applied statistics would be little more than description. With it, we can infer truths about populations from samples, validate scientific claims, test business assumptions, and support regulatory decisions with defensible logic.
This module — Module 6 of the Applied Statistics course — is dedicated entirely to this foundational method. By the end, you will understand every component of a hypothesis test, know which test to choose for which situation, and be able to apply the process in finance, auditing, business, and research contexts.
After completing this module, you will be able to: (1) state null and alternative hypotheses correctly; (2) interpret p-values and significance levels; (3) distinguish Type I from Type II errors; (4) execute Z-tests, t-tests, and chi-square tests; and (5) apply hypothesis testing to real professional scenarios.
What is Hypothesis Testing?
Simple Definition
Hypothesis testing is a statistical procedure used to decide whether there is enough evidence in a sample of data to support a specific claim about a population.
Statistical Definition
Formally, hypothesis testing is an inferential statistical method that evaluates two competing statements about a population parameter — the null hypothesis and the alternative hypothesis — using sample data, a chosen significance level, and a calculated test statistic, in order to make a probabilistic decision.
Practical Meaning
Think of hypothesis testing as a courtroom trial for data. The null hypothesis is the "presumption of innocence" — we assume nothing has changed, nothing is different, no effect exists. The data is the evidence. The statistical test is the judge. If the evidence is strong enough (measured by the p-value), we reject the assumption of innocence and conclude that something real is happening.
- Healthcare: Does a new vaccine reduce infection rates significantly compared to a placebo?
- Finance: Does the average return of a portfolio significantly exceed the market benchmark?
- Marketing: Does a new website design lead to a statistically higher conversion rate?
- Auditing: Is the error rate in a batch of invoices within acceptable control limits?
- Education: Does an online tutoring program significantly improve student test scores?
The Core Idea: Falsifiability
The philosophy behind hypothesis testing comes from the scientific method — specifically, the principle that you cannot definitively prove a theory, but you can attempt to disprove it. That is why we test the null hypothesis (the "no effect" claim) and ask: is our data inconsistent enough with this claim to reject it?
Null Hypothesis (H₀)
The null hypothesis (H₀) is the default assumption that there is no effect, no difference, no relationship, or no change. It is the statement we test and attempt to reject based on evidence from data.
Purpose of the Null Hypothesis
The null hypothesis serves as the baseline position. Just as a defendant is presumed innocent until proven guilty, a population parameter is assumed unchanged or unaffected until sample data provides sufficient evidence against that assumption. This framing protects us from making false claims based on random noise in the data.
Practical Examples of Null Hypotheses
| Domain | Research Question | Null Hypothesis (H₀) |
|---|---|---|
| Finance | Does Strategy A outperform Strategy B? | Mean return of A = Mean return of B |
| Healthcare | Does Drug X reduce blood pressure? | Mean reduction with Drug X = Mean reduction with placebo |
| Marketing | Does the new ad campaign increase sales? | Mean sales before = Mean sales after campaign |
| Auditing | Is the error rate in compliance? | Population error rate ≤ acceptable threshold (e.g., 5%) |
| Education | Does online tutoring improve scores? | Mean score with tutoring = Mean score without tutoring |
| Manufacturing | Does machine produce correct weight? | Mean weight = target weight (e.g., 500g) |
Notice that every null hypothesis contains an equality statement. This is a critical rule: the null hypothesis always includes an "=" sign (whether stated as =, ≤, or ≥). This allows us to specify a single reference value against which we calculate probabilities.
Alternative Hypothesis (H₁ or Hₐ)
The alternative hypothesis (H₁) is the claim we are trying to find evidence for. It represents the possibility of an effect, difference, or relationship. If data provides sufficient evidence against H₀, we accept H₁.
Types of Alternative Hypotheses
The alternative hypothesis can take three forms depending on the direction of your claim:
| Test Type | Symbol | Meaning | Example |
|---|---|---|---|
| Two-tailed | H₁: μ ≠ μ₀ | Parameter is different (either direction) | New drug changes blood pressure (up or down) |
| Right-tailed (Upper) | H₁: μ > μ₀ | Parameter is greater than assumed value | New training increases productivity |
| Left-tailed (Lower) | H₁: μ < μ₀ | Parameter is less than assumed value | New process reduces defect rate |
Why Direction Matters
The direction of your alternative hypothesis determines where the "rejection region" is placed in the distribution. A two-tailed test splits the rejection area between both tails. A one-tailed test concentrates it in one tail, making it more powerful for detecting effects in that specific direction — but incapable of detecting effects in the opposite direction.
You must specify your alternative hypothesis before collecting data. Choosing the direction after seeing results is called "p-hacking" and is a serious research integrity violation.
Null vs. Alternative Hypothesis: Comparison Table
| Feature | Null Hypothesis (H₀) | Alternative Hypothesis (H₁) |
|---|---|---|
| Purpose | Default assumption to test against | The claim we seek evidence for |
| Contains | Equality (=, ≤, ≥) | Inequality (≠, >, <) |
| Default position | We assume it is true initially | Must be supported by evidence |
| What we calculate | Probability assuming H₀ is true | Accepted when H₀ is rejected |
| Decision | Reject or fail to reject | Accept if H₀ is rejected |
| Analogy | Defendant presumed innocent | Prosecution's claim of guilt |
Significance Level (Alpha, α)
The significance level (α) is the probability threshold used to decide whether to reject the null hypothesis. It represents the maximum probability of making a Type I error (rejecting a true null hypothesis) that the researcher is willing to accept.
Common Significance Levels
| Level | Value | Typical Use |
|---|---|---|
| α = 0.10 (10%) | 10% chance of Type I error | Exploratory research, pilot studies |
| α = 0.05 (5%) | 5% chance of Type I error | Standard in social sciences, business, finance |
| α = 0.01 (1%) | 1% chance of Type I error | Medical trials, quality control, high-stakes decisions |
| α = 0.001 (0.1%) | 0.1% chance of Type I error | Physics experiments (particle detection) |
The 0.05 Rule Explained
The 5% significance level, introduced by statistician Ronald Fisher in the 1920s, has become the default standard in most fields. It means: "I am willing to accept a 5% chance of incorrectly rejecting the null hypothesis." In practical terms, if you run the same experiment 100 times under conditions where H₀ is actually true, you would expect to (incorrectly) reject it 5 times by chance alone.
The choice of α should reflect the cost of a wrong decision. In medical research or auditing, where false positives carry serious consequences, α = 0.01 is more appropriate. In exploratory business research where false positives are less costly, α = 0.10 may be acceptable.
Confidence Level
The confidence level is simply (1 − α). At α = 0.05, you have 95% confidence. At α = 0.01, you have 99% confidence. A higher confidence level requires stronger evidence to reject the null hypothesis.
The p-value
The p-value is the probability of observing a test statistic as extreme as — or more extreme than — the one calculated from the sample data, assuming the null hypothesis is true.
How to Interpret the p-value
The p-value answers this precise question: "If the null hypothesis were actually true, how likely would we be to see data this extreme just by chance?" A very small p-value means such extreme data is very unlikely under H₀ — suggesting H₀ is probably false.
| p-value | Interpretation | Decision |
|---|---|---|
| p < 0.001 | Extremely strong evidence against H₀ | Reject H₀ |
| p < 0.01 | Very strong evidence against H₀ | Reject H₀ |
| p < 0.05 | Sufficient evidence against H₀ | Reject H₀ (at α = 0.05) |
| 0.05 ≤ p < 0.10 | Marginal / weak evidence | Fail to reject H₀ (at α = 0.05) |
| p ≥ 0.10 | Insufficient evidence against H₀ | Fail to reject H₀ |
p < 0.05: What It Really Means
When p < 0.05, we say the result is statistically significant at the 5% level. This means: assuming H₀ is true, the probability of observing our sample result is less than 5%. Because this is so unlikely under H₀, we reject H₀ and accept H₁.
p > 0.05: What It Really Means
When p > 0.05, we fail to reject H₀. This does NOT mean H₀ is proven true. It simply means our data did not provide enough evidence to reject it. Absence of evidence is not evidence of absence.
The p-value is NOT the probability that the null hypothesis is true. It is the probability of observing your data (or more extreme data) IF the null hypothesis were true. This distinction is frequently confused even by professionals.
A bank claims its loan approval process takes an average of 48 hours (μ₀ = 48). You sample 40 applications and find a mean of 52 hours with a standard deviation of 10 hours. After running a Z-test, you get p = 0.012.
Interpretation: If the true mean were 48 hours, there is only a 1.2% probability of observing a sample mean of 52 hours or higher. Since 0.012 < 0.05, we reject H₀ and conclude that approval times have significantly exceeded 48 hours.
Type I Error and Type II Error
Type I Error (False Positive)
A Type I Error occurs when we reject a true null hypothesis. We conclude that an effect exists when, in reality, it does not. The probability of a Type I error is equal to the significance level (α).
Real-World Examples of Type I Errors:
- Medicine: Approving a drug that is no more effective than a placebo because trial results appeared significant by chance.
- Finance: Concluding that a trading strategy generates excess returns when its performance was due to random market fluctuations.
- Auditing: Flagging a compliant company as non-compliant due to random sampling results.
- Marketing: Rolling out a new campaign thinking it improved conversions when the increase was random noise.
Type II Error (False Negative)
A Type II Error occurs when we fail to reject a false null hypothesis. We conclude no effect exists when, in reality, one does. The probability of a Type II error is denoted by β (beta).
Real-World Examples of Type II Errors:
- Medicine: Failing to approve an effective drug because the clinical trial sample was too small to detect the true benefit.
- Finance: Dismissing an actually profitable trading strategy as statistically insignificant due to limited data.
- Auditing: Clearing a non-compliant company because the sampled invoices happened to look clean.
- Quality Control: Passing a defective production batch because the sample used did not reveal the defects.
Statistical Power
Power = 1 − β. Power is the probability of correctly rejecting a false null hypothesis. A well-designed study aims for power ≥ 0.80 (80%), meaning there is at least an 80% chance of detecting a real effect when it exists. Power is increased by using larger sample sizes or choosing more sensitive tests.
Type I vs. Type II Error: Comparison Table
| Feature | Type I Error | Type II Error |
|---|---|---|
| Also called | False Positive | False Negative |
| What happened | Rejected a true H₀ | Failed to reject a false H₀ |
| Probability | α (significance level) | β (beta) |
| Controlled by | Reducing α | Increasing sample size / power |
| Trade-off | Lowering α increases β | Lowering β increases α (if n is fixed) |
| Medical analogy | Diagnosing a healthy person as sick | Missing a disease in a sick person |
| Legal analogy | Convicting an innocent person | Acquitting a guilty person |
The Type I / Type II Trade-Off
- For a fixed sample size, reducing Type I error risk (lowering α) increases Type II error risk.
- The solution is not to choose between them — it is to increase the sample size, which reduces both risks simultaneously.
- Context determines which error is more costly. In drug approval, a Type I error (approving ineffective drugs) can be very harmful. In disease screening, a Type II error (missing a disease) can be fatal.
The Hypothesis Testing Process: 7 Steps
-
1State the Hypotheses Write both the null hypothesis (H₀) and the alternative hypothesis (H₁). Decide whether the test is one-tailed (directional) or two-tailed (non-directional). The hypotheses must be mutually exclusive and exhaustive.
-
2Choose the Significance Level (α) Select α before collecting data (typically 0.05). This sets the threshold for "sufficient evidence." Your choice should reflect the domain, the cost of errors, and any regulatory standards.
-
3Select the Appropriate Test and Collect Data Choose the correct statistical test (Z-test, t-test, chi-square, F-test, etc.) based on the data type, sample size, and what is being measured. Then collect or obtain your sample data.
-
4Check Assumptions Every test has assumptions (e.g., normality, independence, homogeneity of variance). Verify these before proceeding. Violating assumptions invalidates the test.
-
5Calculate the Test Statistic Apply the formula for your chosen test to compute the test statistic (Z, t, χ², etc.). This value measures how far your sample result lies from the null hypothesis value in units of standard error.
-
6Find the p-value and Compare to α Using the test statistic and its distribution, calculate the p-value. Compare it to your pre-set significance level: if p < α, reject H₀; if p ≥ α, fail to reject H₀.
-
7Draw a Conclusion in Context State your decision in plain language related to the original question. Do not just say "reject H₀" — explain what it means for the problem at hand. Report effect size and confidence intervals when relevant.
Z-Test
A Z-test is a statistical test used to determine whether the mean of a population significantly differs from a hypothesized value (or whether two population means differ), when the population standard deviation (σ) is known and the sample size is large (n ≥ 30).
When to Use a Z-Test
- Population standard deviation (σ) is known
- Sample size is large (n ≥ 30)
- Data is approximately normally distributed (or n is large enough for CLT to apply)
- Testing means or proportions
The Z-Test Formula
Critical Values for Z-Test
| Test Type | α = 0.10 | α = 0.05 | α = 0.01 |
|---|---|---|---|
| Two-tailed | ±1.645 | ±1.96 | ±2.576 |
| Right-tailed | +1.282 | +1.645 | +2.326 |
| Left-tailed | −1.282 | −1.645 | −2.326 |
Step-by-Step Worked Example: Z-Test
Problem: A cola bottling company claims each bottle contains 500 ml (μ₀ = 500). The machine's standard deviation is known to be σ = 8 ml. A quality inspector randomly samples n = 36 bottles and finds a mean volume of x̄ = 503 ml. Test at α = 0.05 whether the machine is overfilling.
Step 1 — Hypotheses:
- H₀: μ = 500 ml (machine fills correctly)
- H₁: μ > 500 ml (machine is overfilling) — right-tailed test
Step 2 — Significance Level: α = 0.05. Critical value for right-tailed test: Z_critical = +1.645
Step 3 — Test Statistic:
Step 4 — Decision: Z = 2.25 > Z_critical = 1.645, so we reject H₀.
Alternatively: p-value = P(Z > 2.25) ≈ 0.012. Since 0.012 < 0.05, reject H₀.
Step 5 — Conclusion: At the 5% significance level, there is sufficient statistical evidence that the machine is overfilling bottles. The mean fill (503 ml) is significantly greater than the target (500 ml). Maintenance should be scheduled.
t-Test
A t-test is a statistical test used to compare means when the population standard deviation is unknown and must be estimated from the sample. The t-test is the workhorse of applied statistics for most real-world situations.
When to Use a t-Test
- Population standard deviation is unknown (uses sample standard deviation, s)
- Sample size is small (n < 30) OR large (still valid with small samples)
- Population is approximately normally distributed
Three Types of t-Tests
1. One-Sample t-Test
Used to compare the mean of a single sample to a known or hypothesized population mean.
A fund claims its monthly returns average 2%. A sample of 15 months shows x̄ = 2.4% with s = 0.8%. Test at α = 0.05.
df = 14. t_critical (two-tailed, α=0.05, df=14) = ±2.145. Since |1.936| < 2.145, fail to reject H₀. There is insufficient evidence that the fund's returns differ significantly from 2%.
2. Independent Samples t-Test (Two-Sample t-Test)
Used to compare the means of two independent groups.
Two groups of employees take a skills test: Group A (standard training, n=12, x̄=74, s=8) and Group B (new training, n=12, x̄=82, s=9). Did the new training make a significant difference?
After computing the pooled variance and t-statistic (t ≈ 2.41, df = 22), comparing to t_critical = 2.074 at α = 0.05 two-tailed → Reject H₀. New training leads to significantly higher scores.
3. Paired Samples t-Test
Used when the same subjects are measured twice (before/after), or when two measurements are naturally paired. Works on the differences (d = x_after − x_before).
An audit firm trains 10 staff and measures error rates before and after. Mean difference d̄ = −3.2 errors/week, s_d = 2.1, n = 10.
df = 9. t_critical (left-tailed, α=0.05) = −1.833. Since −4.82 < −1.833, reject H₀. Training significantly reduced error rates.
Z-Test vs. t-Test: Comparison Table
| Feature | Z-Test | t-Test |
|---|---|---|
| σ (population SD) | Known | Unknown (use s) |
| Typical sample size | n ≥ 30 | Any size (especially n < 30) |
| Distribution used | Standard Normal (Z) | Student's t-distribution |
| Degrees of freedom | Not applicable | df = n − 1 (or n₁+n₂−2) |
| Critical value (α=0.05, 2-tail) | ±1.96 | Varies by df (e.g., ±2.045 for df=30) |
| Real-world use | Less common (σ rarely known) | Most common in practice |
| Variance assumed | Known σ² | Estimated from sample |
Chi-Square Test (χ²)
The chi-square test (χ²) is a non-parametric statistical test used to analyze categorical data. It tests whether observed frequencies differ from expected frequencies, or whether two categorical variables are independent of each other.
When to Use Chi-Square
- Data is categorical (nominal or ordinal)
- You are counting frequencies or proportions
- Testing independence between two variables, or whether observed data fits a distribution
- Each cell in a contingency table should have expected frequency ≥ 5
The Chi-Square Formula
Two Types of Chi-Square Tests
1. Chi-Square Goodness-of-Fit Test
Tests whether a sample distribution matches a hypothesized (expected) distribution. Uses one categorical variable.
A store expects equal customer visits across 4 days (Monday–Thursday), with 25% each day. Observed: Mon=40, Tue=25, Wed=30, Thu=25 (n=120). Expected = 30 each.
df = categories − 1 = 3. χ²_critical (α=0.05, df=3) = 7.815. Since 5.00 < 7.815, fail to reject H₀. Visitor distribution is consistent with expectations.
2. Chi-Square Test of Independence
Tests whether two categorical variables are related or independent using a contingency table. Uses two categorical variables.
An auditor examines whether department (Sales vs. Finance) is independent of compliance status (Pass vs. Fail). A 2×2 contingency table is created with observed counts. Expected frequencies are calculated as: E = (Row Total × Column Total) / Grand Total. The chi-square statistic is then compared to the critical value with df = (rows − 1)(columns − 1) = 1.
If the result is significant, it means department and compliance status are not independent — compliance rates differ by department.
Degrees of Freedom for Chi-Square
| Test | df Formula |
|---|---|
| Goodness-of-fit | df = k − 1 (where k = number of categories) |
| Test of Independence | df = (rows − 1)(columns − 1) |
Choosing the Right Statistical Test
| Situation | Data Type | n | σ Known? | Recommended Test |
|---|---|---|---|---|
| Compare sample mean to known value | Continuous | ≥30 | Yes | Z-test |
| Compare sample mean to known value | Continuous | Any | No | One-sample t-test |
| Compare means of two independent groups | Continuous | Any | No | Independent t-test |
| Compare before/after measurements (same subjects) | Continuous | Any | No | Paired t-test |
| Test if categorical data fits an expected distribution | Categorical | — | N/A | Chi-square goodness-of-fit |
| Test relationship between two categorical variables | Categorical | — | N/A | Chi-square independence |
| Compare means of 3+ groups | Continuous | Any | No | ANOVA (F-test) |
| Relationship between two continuous variables | Continuous | Any | N/A | Pearson correlation / regression |
A useful decision rule: if your outcome variable is numerical, use Z or t-tests; if it is categorical, use chi-square. If you are comparing more than two groups, consider ANOVA (Module 7).
Hypothesis Testing in Finance
Investment Performance Analysis
Portfolio managers use hypothesis testing to determine whether a fund's returns are statistically significantly different from a benchmark. A one-sample t-test can test whether mean excess returns (alpha) are significantly above zero. Without statistical testing, a fund with a few good years might appear outperforming when its results are indistinguishable from random variation.
Risk Assessment
Finance professionals test whether the variance (or Value-at-Risk) of a portfolio falls within acceptable parameters. They also use hypothesis tests to compare the risk profiles of different asset classes or to validate risk models against historical data.
Market Research and Pricing
Two-sample t-tests are used to compare market prices or transaction volumes between two periods (e.g., pre- and post-policy change). Chi-square tests are used to analyze whether investor sentiment (positive, neutral, negative) is independent of market sector.
Event Studies
Finance researchers use t-tests to detect abnormal stock returns around specific events (earnings announcements, mergers, regulatory changes). A statistically significant abnormal return implies the market incorporated new information.
Hypothesis Testing in Auditing
Compliance Testing
Auditors use hypothesis testing to determine whether an organization's control error rate exceeds the tolerable deviation rate. A one-sample proportion Z-test can assess: H₀: population error rate ≤ 5% (acceptable). If the sample error rate is significantly above 5%, the auditor concludes internal controls are inadequate.
Audit Sampling
Rather than examining every transaction, auditors sample and use hypothesis tests to draw conclusions about the entire population of transactions. The tests determine whether the observed sample errors are consistent with an acceptable population error rate.
Fraud Investigation
Statistical anomalies detected through hypothesis testing can be indicators of fraud. For example, testing whether the frequency distribution of leading digits in financial data follows Benford's Law (using chi-square goodness-of-fit) is a known fraud-detection technique. Significant deviation from the expected distribution may signal fabricated entries.
Hypothesis Testing in Business
A/B Testing for Products and Marketing
Businesses run A/B tests to compare two versions of a product, webpage, or advertisement. A two-sample Z-test or t-test determines whether the observed difference in conversion rates between Version A and Version B is statistically significant — or merely due to chance. A/B testing is the direct industrial application of hypothesis testing.
Customer Satisfaction Analysis
Companies use t-tests to compare satisfaction scores before and after a service change. Chi-square tests assess whether customer satisfaction ratings (Satisfied / Neutral / Dissatisfied) are independent of customer segment (new vs. returning).
Quality Control and Six Sigma
Manufacturing operations use Z-tests and t-tests to determine whether production processes meet specifications. Six Sigma methodology is entirely built on hypothesis testing — processes are improved until the defect rate is reduced to statistically near-zero levels.
Common Hypothesis Testing Mistakes (and How to Avoid Them)
- Misinterpreting the p-value. The p-value is not the probability that H₀ is true — it is the probability of obtaining data this extreme if H₀ were true. Solution: Always state p-value interpretations correctly.
- Equating "fail to reject H₀" with "accept H₀." Not finding evidence against H₀ does not prove it is true. Solution: Use the phrase "fail to reject" rather than "accept."
- Confusing statistical significance with practical significance. A statistically significant result (p < 0.05) may have a trivially small effect size. Solution: Always report effect sizes (Cohen's d, eta-squared, etc.) alongside p-values.
- Choosing the wrong test. Using a Z-test when σ is unknown, or using t-tests for categorical data. Solution: Follow the test selection decision table.
- Ignoring test assumptions. Running a t-test on severely non-normal data with small samples. Solution: Always check normality, independence, and variance homogeneity assumptions.
- Using a one-tailed test when a two-tailed is appropriate. One-tailed tests are more powerful but only valid when the direction of difference is specified in advance. Solution: Default to two-tailed unless there is a strong theoretical justification.
- P-hacking (data dredging). Running multiple tests and only reporting the significant ones inflates the Type I error rate. Solution: Pre-register hypotheses; apply Bonferroni correction for multiple tests.
- Setting α after seeing the data. Choosing your significance level after peeking at the results. Solution: Always set α before data collection.
- Insufficient sample size. Underpowered studies (small n) miss real effects. Solution: Conduct power analysis before data collection to determine adequate n.
- Assuming causation from correlation. A statistically significant relationship does not imply that one variable causes another. Solution: Apply causal reasoning frameworks; use controlled experiments.
- Using mean comparisons on ordinal data. Running a t-test on Likert scale data (1–5 ratings). Solution: Use non-parametric tests (Mann-Whitney, Wilcoxon) for ordinal data.
- Ignoring outliers. Extreme values can inflate variance and distort test statistics. Solution: Identify and assess outliers; use robust statistical methods when necessary.
- Neglecting the effect of multiple comparisons. Testing 20 variables at α = 0.05 means at least one false positive is expected by chance. Solution: Apply false discovery rate (FDR) corrections in multiple-testing contexts.
- Over-relying on hypothesis tests without confidence intervals. A confidence interval conveys both significance and the size/direction of effects. Solution: Always accompany hypothesis test results with confidence intervals.
- Not reporting null results. Publication bias toward significant results distorts the scientific record. Solution: Report all tests, including non-significant findings, with context.
Practical Case Study: Retail Bank Loan Processing
Business Problem: FastTrack Bank's management believes its digital loan processing system has reduced the average approval time from the historical average of 72 hours. A quality team samples 25 recent applications and records the following approval times (in hours):
60, 65, 68, 70, 71, 63, 75, 72, 69, 67, 64, 66, 70, 73, 61, 68, 74, 62, 65, 69, 70, 63, 67, 71, 68
Step 1 — Hypotheses:
- H₀: μ ≥ 72 (processing time has not decreased)
- H₁: μ < 72 (processing time has decreased) — left-tailed test
Step 2 — Significance Level: α = 0.05
Step 3 — Test Selection: One-sample t-test (σ unknown, n = 25)
Step 4 — Sample Statistics:
Step 5 — Calculate Test Statistic:
Step 6 — Critical Value and Decision:
df = 24. t_critical (left-tailed, α = 0.05) = −1.711.
Since −5.604 < −1.711, we reject H₀.
p-value < 0.0001.
Step 7 — Conclusion: At the 5% significance level, there is very strong statistical evidence that FastTrack Bank's digital loan processing system has significantly reduced the average approval time from 72 hours. The sample mean of 67.64 hours represents an estimated reduction of approximately 4.36 hours. Management's decision to invest in the digital system is statistically justified.
Business Recommendation: FastTrack should continue and scale the digital processing system. Future analysis should examine whether the reduction is consistent across all loan types and customer segments.
Key Takeaways
Module 6 — Core Concepts to Remember
- Hypothesis testing is a framework for making evidence-based decisions about population parameters using sample data.
- The null hypothesis (H₀) represents the "no effect / no change" position; the alternative hypothesis (H₁) represents what we are trying to prove.
- The significance level (α) is the threshold for rejecting H₀. The most common value is α = 0.05 (5%).
- The p-value is the probability of obtaining results as extreme as observed, assuming H₀ is true. A small p-value (p < α) leads to rejection of H₀.
- A Type I Error (false positive, probability = α) is rejecting a true H₀. A Type II Error (false negative, probability = β) is failing to reject a false H₀.
- Use a Z-test when σ is known and n ≥ 30. Use a t-test when σ is unknown.
- Use the chi-square test for categorical data — either to test goodness-of-fit or independence between two categorical variables.
- Statistical significance does not equal practical importance. Always assess effect size.
- The 7-step process: state hypotheses → choose α → select test → check assumptions → calculate test statistic → compare to critical value/p-value → draw conclusion.
Practice Exercises
Part A: 20 Conceptual Questions
- In your own words, explain the difference between a null hypothesis and an alternative hypothesis.
- Why is the null hypothesis always stated with an equality?
- What does a p-value of 0.03 mean in the context of a hypothesis test?
- A researcher finds p = 0.08 at α = 0.05. What is the decision, and what does it mean?
- Explain the difference between Type I and Type II errors using a medical testing analogy.
- Why can lowering α increase the risk of Type II error (for fixed n)?
- What is statistical power, and why does it matter?
- When would you use a one-tailed test instead of a two-tailed test?
- Explain why a statistically significant result is not necessarily practically meaningful.
- Why should you never change your significance level after seeing the data?
- What is the difference between a Z-test and a t-test? When is each appropriate?
- What are the assumptions of the independent samples t-test?
- Explain the chi-square test of independence. What is the null hypothesis in such a test?
- A researcher runs 20 hypothesis tests at α = 0.05. How many false positives should they expect by chance, even if all null hypotheses are true?
- What is the Bonferroni correction and when would you apply it?
- In auditing, which error type is potentially more dangerous: Type I or Type II? Explain your reasoning.
- What does "95% confidence level" mean in the context of hypothesis testing?
- Why is it incorrect to say "we accept the null hypothesis"?
- How does increasing the sample size affect the p-value? Why?
- What is "p-hacking," and why is it a serious problem in research?
Part B: 10 Numerical Problems (with Solutions)
Problem 1. A machine fills bags with 1 kg of flour (μ₀ = 1000g, σ = 20g). A sample of 49 bags gives x̄ = 994g. Test at α = 0.05 (two-tailed) whether the machine is working correctly.
Problem 2. A tire company claims mean tire life = 50,000 km. A sample of 16 tires has x̄ = 48,500 km, s = 3,200 km. Test at α = 0.05 (left-tailed).
Problem 3. Two sales teams: Team A (n=15, x̄=85K, s=12K) vs. Team B (n=15, x̄=80K, s=10K). Test at α = 0.05 if means differ significantly.
Problem 4. Loan approvals before/after policy change (10 pairs). d̄ = 2.3 days, s_d = 1.8 days. Test at α = 0.05 if the policy reduced time.
Problem 5. A die is rolled 60 times. Each face should appear 10 times. Observed: 8, 9, 12, 11, 11, 9. Test fairness using chi-square at α = 0.05.
Problems 6–10: Available in the module workbook (extended numerical problems covering two-proportion Z-tests, chi-square independence tests, power analysis, ANOVA preview, and finance applications).
Multiple Choice Quiz: 30 Questions
Frequently Asked Questions (FAQs)
Final Module Summary
Module 6 has taken you through one of the most important analytical frameworks in applied statistics: hypothesis testing. Here is a complete recap of everything covered:
Foundations: Hypothesis testing gives analysts a principled, mathematical way to determine whether sample evidence is strong enough to support a claim about a population. The framework begins with two competing hypotheses: the null hypothesis (H₀) — the default claim of no effect — and the alternative hypothesis (H₁) — the claim we wish to support.
Decision Rules: The significance level (α) is the threshold for our decision. The p-value — the probability of observing data this extreme assuming H₀ is true — is compared to α. When p < α, we reject H₀ and conclude results are statistically significant.
Errors: Two types of mistakes are possible. A Type I Error (false positive, probability = α) means rejecting a true H₀. A Type II Error (false negative, probability = β) means failing to reject a false H₀. Statistical power (1 − β) measures our ability to detect real effects.
Tests Covered:
- Z-Test: For large samples with known population standard deviation. Uses the standard normal distribution.
- t-Test (one-sample, independent, paired): For unknown σ. The most widely used test in practice. Uses the t-distribution with appropriate degrees of freedom.
- Chi-Square Test (goodness-of-fit and independence): For categorical data. Tests whether observed frequencies match expected frequencies or whether two categorical variables are related.
Applications: Hypothesis testing drives decision-making across finance (investment analysis, event studies), auditing (compliance testing, fraud detection), and business (A/B testing, quality control, customer analysis).
Best Practices: Always set hypotheses and α before data collection. Report effect sizes alongside p-values. Verify test assumptions. Avoid p-hacking and multiple testing inflation. Distinguish statistical significance from practical importance.
You have now completed Module 6: Hypothesis Testing. You are equipped to design, execute, and interpret hypothesis tests in academic and professional contexts.
SEO Metadata for Blogger
- SEO Title:
- Hypothesis Testing — Complete Guide | Module 6 Applied Statistics
- Meta Description:
- Master hypothesis testing with this complete Module 6 guide. Learn null hypothesis, p-value, Z-test, t-test, chi-square, Type I & II errors with worked examples, 30 MCQs, and 30 FAQs. For students, analysts, and professionals.
- URL Slug:
- /applied-statistics/module-6-hypothesis-testing
- Focus Keyword:
- hypothesis testing
- Secondary Keywords:
- null hypothesis, alternative hypothesis, p-value, z-test, t-test, chi-square test, type I error, type II error, statistical significance, inferential statistics, applied statistics
- Schema Type:
- Course, FAQPage, EducationalOccupationalCredential
Applied Statistics Course | Module 6 | Hypothesis Testing
© Applied Statistics Course. Educational content for students, researchers, and professionals.
0 Comments