How to Read a Forest Plot

Click any row to learn what each element of a forest plot means. The diamond at the bottom represents the pooled estimate across all trials.

Example meta-analysis — five trials, low heterogeneity (I² = 18%)

Study

Effect (favours intervention →)

Weight

ES [95% CI]

Smith 2019
n=120, RCT

18.4%

0.42 [0.18, 0.66]

Chen 2020
n=84, RCT

14.2%

0.38 [0.12, 0.64]

Patel 2021
n=210, RCT

32.1%

0.35 [0.18, 0.52]

Rossi 2022
n=66, RCT

11.8%

0.12 [−0.26, 0.50]

Kim 2023
n=180, RCT

23.5%

0.40 [0.18, 0.62]

Pooled effect

100%

0.37 [0.24, 0.50]

I² (heterogeneity)

18%

Pooled p-value

p = 0.001

No. of trials

Total N

660

Box size = weight

Larger boxes represent trials that contributed more to the pooled estimate — usually because they enrolled more participants or had lower variance.

Diamond = pooled result

The diamond's centre is the pooled point estimate. Its width is the confidence interval. A diamond that crosses the zero line means the pooled result is not statistically significant.

The I² statistic quantifies the proportion of variability that reflects genuine between-study differences rather than chance. Compare these two scenarios to see what low and high heterogeneity look like in practice.

Consistent findings — trials tell a similar story

Study

Effect

Weight

ES [95% CI]

Trial A

28%

0.38 [0.20, 0.56]

Trial B

22%

0.42 [0.22, 0.62]

Trial C

35%

0.33 [0.18, 0.48]

Trial D

15%

0.36 [0.14, 0.58]

Pooled

100%

0.37 [0.28, 0.46]

I²

12%

Interpretation

Low unexplained variability

The trials are telling a consistent story. The squares cluster in the same region, their intervals largely overlap, and the pooled diamond is narrow. An I² of 12% means most of the variability is attributable to chance rather than genuine between-study differences. The pooled estimate is a reasonable summary.

Inconsistent findings — trials tell conflicting stories

Study

Effect

Weight

ES [95% CI]

Trial A

22%

0.82 [0.44, 1.20]

Trial B

20%

−0.18 [−0.46, 0.10]

Trial C

30%

0.54 [0.22, 0.86]

Trial D

18%

−0.32 [−0.62, −0.02]

Trial E

10%

0.44 [0.08, 0.80]

Pooled

100%

0.28 [−0.14, 0.70]

I²

84%

Interpretation

High — investigate cause

The trials are not telling a consistent story. The squares are spread across a wide range, some showing benefit and others showing harm. The I² of 84% means most of the variability is not explained by chance — though whether that reflects true biological differences across populations, bias, or measurement differences requires investigation. The pooled diamond crosses zero and is wide. Accepting it as a reliable summary would be misleading; the appropriate response is to investigate what is driving the spread.

A funnel plot shows each study's effect size against its precision. In a symmetric, unbiased literature, studies form an inverted funnel. When small null studies are missing — the pattern publication bias predicts — the funnel develops a gap.

Studies are distributed symmetrically around the centre line. Large precise studies cluster at the top; smaller studies scatter evenly on both sides at the bottom. No systematic absence of studies in any quadrant.

Important caveat. Funnel plot asymmetry does not prove publication bias — it can also arise from genuine heterogeneity, small-study effects unrelated to bias, or chance when the number of studies is small (generally fewer than 10). It is a signal to investigate further, not a definitive finding. Formal tests such as Egger's test provide a quantitative complement to visual inspection but are not definitive either.

Drawing a conclusion from a body of evidence requires specifying what it supports, at what level of confidence, for which outcome, in which population. These examples show how the same evidence can support very different statements depending on scope.

Strong

Strong (Clinical)

Replicated across multiple large, independent, pre-registered trials measuring patient-relevant outcomes. Results are consistent, effect sizes are clinically meaningful, confidence intervals are narrow. Example: creatine for strength and lean mass in resistance training contexts.

Moderate

Moderate (Clinical)

Meaningful evidence base with some limitations — smaller trials, less independent replication, or modest effect sizes. Supports a considered position but not high confidence. Example: magnesium for sleep quality in people with low dietary intake.

Moderate (Biomarker)

Consistent evidence for a biomarker change, but the relationship to clinical outcomes has not been established. The evidence supports the biomarker effect, not the clinical implication marketing may attach to it. Example: berberine reducing fasting glucose — consistent, but clinical outcome translation is less certain.

Emerging

Early signals from limited or preliminary evidence. Insufficient for confident claims but worth monitoring. Mechanistic rationale may be strong; human trial evidence is thin or inconsistent. Example: many newer longevity-focused compounds with small early trials.

Insufficient

The honest position when no meaningful conclusion can be drawn. This is not a failure — it is accurate representation of where the evidence currently sits. It protects readers from overconfident claims in either direction.

Calibrated uncertainty is not the same as saying nothing. A statement that distinguishes what is supported from what is not — that names the population, the outcome, the dose, and the confidence level — is far more useful than either "this works" or "we can't know." The goal is honesty about the current state of the evidence, not false resolution in either direction.

Reading a body of evidence