Advanced Testing

What Sample Sizes Do You Need for Statistically Valid Ad Tests?

Calculate the right sample sizes for your Meta ad tests to ensure results are statistically valid and actionable for decision-making.

|13 min read
YB
Yaron Been

Founder @ ROASPIG

Why Does Sample Size Matter for Ad Testing?

Sample size determines whether your test results reflect reality or random chance. Underpowered tests—those with insufficient data—produce unreliable results that lead to poor decisions. You might kill a winning ad or scale a loser simply because you didn't collect enough data.

Proper sample size calculation before testing ensures you can trust your results and make confident decisions based on actual performance differences.

The Risks of Insufficient Sample Size

  • False positives: Declaring a winner when differences are random noise
  • False negatives: Missing real improvements because test lacked power
  • Wasted budget: Running tests that can't produce valid conclusions
  • Wrong decisions: Scaling losers or killing winners based on unreliable data

What Factors Determine Required Sample Size?

Key Variables in Sample Size Calculation

  • Baseline conversion rate: Your current performance level
  • Minimum detectable effect (MDE): Smallest improvement worth detecting
  • Statistical significance level: Typically 95% (alpha = 0.05)
  • Statistical power: Typically 80% (beta = 0.20)
  • Number of variants: More variants require larger samples

Understanding Minimum Detectable Effect

MDE is the smallest improvement you want to reliably detect. Smaller effects require larger samples:

  • 10% relative improvement: Requires large sample size
  • 20% relative improvement: Moderate sample size
  • 50% relative improvement: Smaller sample size needed

Choose MDE based on business impact. If a 10% improvement matters, design for that. If only 50%+ improvements are actionable, you need less data.

How Do You Calculate Required Sample Size?

Sample Size Guidelines by Conversion Rate

For 95% confidence and 80% power, detecting a 20% relative improvement:

  • 1% baseline conversion rate: ~16,000 visitors per variant
  • 2% baseline conversion rate: ~8,000 visitors per variant
  • 5% baseline conversion rate: ~3,200 visitors per variant
  • 10% baseline conversion rate: ~1,600 visitors per variant

Conversion-Based Rules of Thumb

For practical planning, use conversion counts:

  • Minimum viable: 50 conversions per variant
  • Recommended: 100 conversions per variant
  • High confidence: 200+ conversions per variant

Higher baseline conversion rates reach these thresholds faster with less traffic.

How Do You Apply Sample Size to Test Planning?

Step 1: Estimate Required Conversions

  • Determine baseline conversion rate: Check historical campaign data
  • Set minimum detectable effect: Smallest improvement worth acting on
  • Calculate conversions needed: Use calculator or guidelines above

Step 2: Calculate Required Traffic

  • Traffic per variant: Conversions needed / Conversion rate
  • Total traffic: Traffic per variant x Number of variants
  • Example: 100 conversions at 2% rate = 5,000 visitors per variant

Step 3: Estimate Test Duration

  • Daily traffic: Your current daily visitors or impressions
  • Days needed: Total traffic / Daily traffic
  • Minimum duration: At least 7 days to capture weekly patterns

What Are Common Sample Size Mistakes?

  • Stopping early: Ending tests when one variant looks ahead
  • Ignoring multiple comparisons: Testing many variants without adjusting significance
  • Post-hoc rationalization: Finding "significance" in underpowered tests
  • Forgetting weekly patterns: Weekend traffic differs from weekday
  • Not accounting for variance: High-variance metrics need more data

How to Avoid Peeking Bias

Repeatedly checking results increases false positive risk:

  • Pre-commit to duration: Decide test length before starting
  • Use sequential testing methods: If early stopping is needed, use proper methods
  • Automate decisions: Remove temptation to check constantly

How Do You Handle Low-Traffic Situations?

Strategies for Limited Budget

  • Test fewer variants: 2 variants instead of 5
  • Accept larger MDE: Only detect 50%+ improvements
  • Use directional signals: Lower confidence for exploratory tests
  • Extend duration: Run longer to accumulate more data
  • Focus on high-impact tests: Prioritize tests most likely to show clear differences

How Does ROASPIG Help with Test Planning?

  • Efficient variant creation: Generate only needed variants, not excessive options
  • Systematic testing: Plan sequential tests to build on winners
  • Rapid iteration: Quickly create new variants when tests conclude
  • Creative documentation: Track which variants were tested and results
  • Template consistency: Ensure clean tests without accidental variation

Conclusion

Sample size calculation is foundational to valid ad testing. Before launching any test, know how many conversions you need per variant and how long the test must run. Underpowered tests waste budget on unreliable data. Properly powered tests generate insights you can trust and act on confidently.

Related resources:

Frequently Asked Questions About Sample Size Ad Testing

Sample size determines whether results reflect reality or random chance. Underpowered tests produce unreliable results—you might kill winners or scale losers based on noise. Proper sample size ensures you can trust your conclusions.

Minimum viable: 50 conversions per variant. Recommended: 100 conversions per variant. High confidence: 200+ conversions per variant. Higher baseline conversion rates reach these thresholds faster with less traffic.

MDE is the smallest improvement you want to reliably detect. Smaller effects require larger samples. If only 50%+ improvements are actionable, you need less data. If 10% improvements matter, design tests to detect those.

Calculate: Total traffic needed / Daily traffic = Days needed. Minimum 7 days to capture weekly patterns. Don't stop early when one variant looks ahead—this introduces bias and increases false positive rates.

Options: test fewer variants (2 instead of 5), accept larger MDE (only detect 50%+ differences), extend test duration, use directional signals with lower confidence, or focus budget on fewer high-impact tests.

Related Posts

Ready to speed up your creative workflow?

50 free credits. No credit card required. Generate, organize, publish to Meta.

Start Free Trial