p-Value Calculator

Convert any test statistic (z, t, F, or χ²) into a p-value. Supports one-tailed and two-tailed tests, shows critical values, significance interpretation, and Bonferroni correction.

p-value
Significance Level
Critical Value (α=0.05)
Extended More scenarios, charts & detailed breakdown
p-value
Significance
Area below z (CDF)
Professional Full parameters & maximum detail

p-Value & Decision

p-value (two-tailed)
Critical z-value
Decision

Error Rates

Type I Error Rate (α)
Bonferroni-corrected α (for 5 tests)

Scientific Integrity

p-Hacking Warning

How to Use This Calculator

  1. Select your distribution (Z, t, Chi-Square, or F).
  2. Enter your test statistic value.
  3. Enter degrees of freedom if required (t, χ², F).
  4. Choose one-tailed or two-tailed.
  5. Read the p-value and significance interpretation instantly.
  6. Professional tab shows Bonferroni correction and p-hacking warnings.

Formula

Z: p = 2 × P(Z > |z|) for two-tailed

t: p = 2 × P(T_df > |t|)

χ²: p = P(χ²_df > χ²_obs)

F: p = P(F_{df1,df2} > F_obs)

Example

z = 1.96, two-tailed: p = 2 × (1 − Φ(1.96)) = 2 × 0.025 = 0.050 — just at the α=0.05 threshold.

Frequently Asked Questions

  • A p-value is the probability of observing a test statistic at least as extreme as the one computed from your sample, assuming the null hypothesis is true. More concisely: it measures how surprising your data would be if H₀ were correct. A small p-value (close to 0) means the observed data would be very unlikely under the null hypothesis, providing evidence against it. A large p-value means the data is consistent with H₀. Critically, the p-value does NOT tell you the probability that your hypothesis is true or false, the probability that the result was due to chance, or the probability that you'll get the same result if you repeat the study. These are common and serious misinterpretations. The p-value is a single number computed from your sample; it is itself a random variable that varies from sample to sample. R.A. Fisher introduced the p-value concept in 1925 as a measure of evidence against a null hypothesis, not as a binary accept/reject mechanism.
  • The 0.05 threshold is largely a historical convention traced to R.A. Fisher's 1925 book 'Statistical Methods for Research Workers.' Fisher suggested that a result is 'significant' if it would occur by chance fewer than 1 in 20 times. He never intended this as a rigid rule — he considered it a rough guide. Neyman and Pearson later formalized the decision-theoretic framework with fixed α and β (Type I and Type II errors). The 0.05 threshold became entrenched through widespread adoption in textbooks, journals, and statistical software. Alternative thresholds exist for good reasons: α = 0.01 is used in high-stakes medical research where false positives are costly; α = 0.10 is used in exploratory research; particle physics uses α ≈ 0.0000003 (5-sigma standard) because confirming a new particle requires extraordinary evidence. The ASA 2016 statement explicitly cautioned against treating 0.05 as a bright line and encouraged reporting exact p-values with effect sizes and confidence intervals.
  • A p-value and confidence interval (CI) address related but different questions. A p-value tests whether an effect is zero under the null hypothesis — it's a binary-ish question. A confidence interval estimates the plausible range of the true effect size — it's a quantitative question. A 95% CI for a mean difference that excludes zero is exactly equivalent to a two-tailed p-value < 0.05. However, the CI provides far more information: it tells you both the statistical significance and the practical magnitude of the effect. A mean difference of 0.1 might be statistically significant with a huge sample (p=0.001) but the 95% CI of [0.05, 0.15] reveals the effect is tiny. Conversely, a CI of [-2, 18] (spanning zero) shows the effect might be large but the study is underpowered. The ASA and most journals now encourage reporting CIs alongside (or instead of) p-values. The APA Publication Manual recommends reporting both. In summary: p-value answers 'is there an effect?'; CI answers 'how large is the effect, and how precisely do we know?'
  • p-hacking refers to manipulating data analysis until a statistically significant result is obtained — then reporting only that result. Common practices include collecting data until p < 0.05, dropping inconvenient outliers, trying different statistical tests until one gives p < 0.05, selectively including or excluding covariates, or running the same test on many subgroups and reporting only the significant ones. Each additional test at α=0.05 carries a 5% chance of a false positive. With 20 tests, you expect one significant result purely by chance. p-hacking inflates the actual false positive rate far above the nominal α. John Ioannidis's landmark 2005 paper 'Why Most Published Research Findings Are False' demonstrated mathematically that the majority of published findings in many fields are false positives, largely due to these practices. Solutions include pre-registration (publishing your analysis plan before data collection), multiple comparison corrections (Bonferroni, FDR), open data and code sharing, and replication. The replication crisis in psychology and medicine (2011–present) made p-hacking a major focus of scientific reform.
  • No — p < 0.05 is a signal, not a verdict. Statistical significance and practical significance are completely different concepts. A study with n=100,000 might detect a mean difference of 0.001 units with p=0.001, but the effect is negligible in the real world. Always pair p-values with effect sizes (Cohen's d for means, η² for ANOVA, r for correlations) and confidence intervals. Second, statistical significance depends on sample size — with a large enough sample, virtually any hypothesis will be rejected. Third, the null hypothesis is rarely literally true (exact equality almost never holds in nature), so very large samples will always find significance. Fourth, context matters: α=0.05 may be appropriate for exploratory research but too lenient for drug approval decisions. Benjamin et al. (2018) proposed lowering the standard threshold to α=0.005 for new discoveries in science. The consensus view is to treat p-values as one piece of evidence among many — including effect size, CI, prior probability, and replication — rather than as the sole arbiter of truth.

Related Calculators

Sources & References (5)
  1. Fisher 1925 — Statistical Methods for Research Workers (p-value origins) — Oliver and Boyd
  2. ASA 2016 Statement on p-Values (Wasserstein & Lazar) — American Statistical Association
  3. NIST/SEMATECH Engineering Statistics Handbook — Hypothesis Testing — NIST
  4. Wasserstein & Lazar 2016 — "The ASA Statement on p-Values: Context, Process, and Purpose" — The American Statistician
  5. Stanford CS109 — Probability for Computer Scientists — Stanford University