The p-value is one of the most widely used — and widely misunderstood — concepts in medical research. Almost every medical journal paper reports a p-value, yet most researchers have subtle misconceptions about what it actually means. This guide explains p-values clearly, addresses common myths, and shows how to interpret and report them correctly.
What is a P-Value?
The p-value is the probability of observing your data (or more extreme data) IF the null hypothesis were true. It is a conditional probability, not an absolute measure of truth.
- Null Hypothesis (H0): Assumes no effect, no difference, no association exists
- Alternative Hypothesis (H1): States that an effect, difference, or association exists
- p-value ranges from 0 to 1
- Smaller p-value = your data are less compatible with the null hypothesis
- Conventional threshold: p < 0.05 (5% significance level) — set by Ronald Fisher in 1925
How to Interpret P-Values
- p < 0.05: Results are statistically significant — reject the null hypothesis
- p ≥ 0.05: Results are not statistically significant — fail to reject null hypothesis
- p < 0.001: Highly statistically significant — very strong evidence against null
- p = 0.049 and p = 0.051 are statistically virtually identical — do not treat them as categorically different
- The threshold 0.05 is a convention, not a law of nature — some fields use 0.01 or 0.001
- A significant p-value tells you an effect likely exists, NOT how large it is
Statistical Analysis Support for Your Thesis
Our biostatisticians help MD/MS/DNB students with statistical analysis, p-value reporting, and result interpretation.
Common Misconceptions About P-Values
- Myth: "p < 0.05 means there is a 95% chance the result is true." Fact: The p-value says nothing about the probability that your hypothesis is true.
- Myth: "p = 0.04 is meaningful, p = 0.06 is not." Fact: This binary thinking is misleading — treat p-values as continuous measures of evidence.
- Myth: "A significant p-value means the effect is clinically important." Fact: A huge study can produce p < 0.001 for a trivially small, clinically meaningless effect.
- Myth: "p > 0.05 proves the null hypothesis." Fact: Absence of evidence is not evidence of absence — it may just mean your study was underpowered.
- Myth: "The p-value is the probability that the null hypothesis is true." Fact: It is the probability of observing data at least this extreme IF the null hypothesis were true.
Confidence Intervals vs P-Values
Confidence intervals (CIs) provide more information than p-values and are preferred by many journals.
- A 95% CI means: if you repeated the study 100 times, 95 of the intervals would contain the true effect
- If the 95% CI for an odds ratio excludes 1.0, it is statistically significant (equivalent to p < 0.05)
- CIs show both statistical significance AND the precision/magnitude of the estimate
- Wide CI = large uncertainty, small sample size
- Narrow CI = high precision, large sample size
- CONSORT, STROBE, and other reporting guidelines mandate CIs alongside p-values
Statistical vs Clinical Significance
A result can be statistically significant without being clinically meaningful — and vice versa.
- Statistically significant, not clinically significant: Drug lowers BP by 1 mmHg (p < 0.001 in n=50,000 trial) — not clinically useful
- Clinically significant, not statistically significant: New treatment improves survival by 20% but p = 0.08 in a small pilot study (underpowered)
- Always interpret results in the context of effect size and clinical plausibility
- Effect size measures: Cohen's d, odds ratio, number needed to treat (NNT)
- NNT: how many patients must be treated for one to benefit — more clinically interpretable than p-value
Expert Biostatistics for Your Research
Get professional help with statistical analysis, confidence intervals, effect sizes, and result reporting for your thesis or publication.
Type I and Type II Errors
- Type I Error (α): Rejecting a true null hypothesis — a false positive. Rate = significance level (0.05 = 5% risk)
- Type II Error (β): Failing to reject a false null hypothesis — a false negative. Rate = 1 - Power
- Power (1-β): Probability of detecting a true effect. Usually set at 80% or 90%
- Lower α (e.g., 0.01) reduces Type I error but increases Type II error
- Adequate sample size reduces both Type I and Type II error rates
- Multiple testing problem: Running 20 tests, one will be "significant" by chance — use Bonferroni correction
Reporting P-Values Correctly
- Report exact p-values: p = 0.023, not just "p < 0.05"
- For very small values: p < 0.001 (do not write p = 0.0000)
- Always report alongside effect size and 95% CI
- Use 3 decimal places: p = 0.034 not p = 0.0344
- ICMJE and APA style both require exact p-values
- Never say "trend toward significance" for p = 0.06 — it is either significant or not
❓ Frequently Asked Questions
Quick answers to common questions about p-values in medical research
The 0.05 threshold was proposed by Ronald Fisher in 1925 as a convenient convention, not a fundamental law. It means you accept a 5% risk of a false positive. Some fields (genomics, particle physics) use much stricter thresholds. The 0.05 threshold remains standard in clinical medicine but is increasingly questioned by statisticians.
A p-value of 0.001 means there is only a 0.1% probability of observing data at least as extreme as yours, assuming the null hypothesis is true. It provides very strong evidence against the null hypothesis and is considered highly statistically significant. It does NOT mean there is a 99.9% chance your hypothesis is true.
Yes. Many journals (including PLOS ONE, PLOS Medicine, and specialty journals) publish high-quality studies with non-significant results. Negative results are scientifically valuable — they prevent other researchers from pursuing the same dead end. Publication bias (only publishing significant results) is a serious problem in medical research.
P-hacking (or data dredging) involves manipulating analyses until a significant p-value is found — by excluding outliers, adding covariates, or testing multiple subgroups without correction. It is a major cause of the reproducibility crisis in medicine. Pre-registration of study hypotheses and analysis plans in PROSPERO or ClinicalTrials.gov helps prevent p-hacking.
Always report: the test statistic (t, F, chi-square, z), degrees of freedom, exact p-value, effect size (Cohen's d, OR, RR, mean difference), and 95% confidence intervals. Example: "Systolic BP was significantly lower in the treatment group (mean difference: 8.2 mmHg, 95% CI: 4.5-11.9, t(98) = 4.56, p < 0.001)."