O Problema da Incrementalidade: O que Seu Teste A/B Não Está Mostrando

Além da Publicidade

TL;DR

Testes A/B medem variações entre versões. Incrementalidade mede se a ação de marketing gerou resultado que, sem ela, simplesmente inexistiria. A confusão entre os dois conceitos faz empresas otimizarem criativos de campanhas que geram zero vendas incrementais. Lewis e Rao demonstraram em 2015 que o efeito da publicidade sobre vendas opera em escala tão pequena, comparado ao ruído natural do comportamento de compra, que experimentos confiáveis exigem amostras gigantescas. Enquanto isso, Blake, Nosko e Tadelis provaram no eBay que anúncios de busca para termos de marca geram retorno incremental estatisticamente indistinguível de zero para consumidores frequentes. O problema da incrementalidade exige novos métodos. Ghost Ads, geo-experiments e Marketing Mix Modeling oferecem caminhos reais.

O teste que todo mundo roda e quase ninguém interpreta

Uma empresa de e-commerce decide testar duas versões de um anúncio no Meta. Versão A com fundo azul, versão B com fundo vermelho. Roda o teste por duas semanas. A versão B gera 14% mais cliques. Celebração. Relatório. Novo criativo aprovado.

Três perguntas que ninguém fez. Quantas dessas pessoas teriam comprado de qualquer maneira, sem ver nenhum anúncio? O efeito medido é real ou flutuação estatística? Se cortássemos o orçamento inteiro dessa campanha, o que aconteceria com as vendas totais?

Essas perguntas definem o conceito de incrementalidade. E o teste A/B tradicional, por desenho, ignora todas elas.

Um teste A/B compara a performance relativa entre variantes. Responde à pergunta: "A ou B funciona melhor?" O que ele ignora: se ambas as variantes produzem resultado superior ao cenário sem nenhuma intervenção. Otimizar entre variações de algo que gera impacto incremental zero produz a ilusão de progresso.

10M+ Person-weeks necessárias para um experimento de publicidade estatisticamente confiável, segundo Lewis e Rao (2015), publicado no Quarterly Journal of Economics

A economia desfavorável da mensuração

Em 2015, Randall Lewis e Justin Rao publicaram no Quarterly Journal of Economics um estudo que deveria ter reformulado a mensuração de marketing inteira. O título resumia tudo: "The Unfavorable Economics of Measuring the Returns to Advertising".

O argumento central: o efeito causal de publicidade sobre vendas tende a ser pequeno quando comparado à variância natural do comportamento do consumidor. Em linguagem menos técnica, as pessoas compram e deixam de comprar por razões que excedem qualquer campanha. Salário entrou. Geladeira quebrou. Amigo indicou. Oferta do concorrente apareceu. O ruído de fundo do comportamento humano é enorme.

Quando o sinal (efeito da publicidade) é pequeno e o ruído (variância no comportamento) é grande, detectar esse sinal exige amostras absurdamente grandes. Lewis e Rao calcularam: experimentos informativos sobre eficácia de publicidade exigem frequentemente mais de 10 milhões de person-weeks. Para contextualizar, isso significa expor milhões de pessoas a condições controladas por semanas.

A maioria das empresas roda testes com amostras centenas ou milhares de vezes menores. O resultado: conclusões estatisticamente insignificantes tratadas como evidência, viés de confirmação reforçando crenças preexistentes, e decisões de orçamento bilionárias baseadas em ruído.

"Apesar dessa economia desfavorável, experimentos controlados randomizados representam progresso ao injetar informação nova e sem viés no mercado. Os desafios de inferência revelados nos experimentos de campo também mostram que o viés de seleção, devido à natureza segmentada da publicidade, é uma preocupação debilitante para métodos observacionais amplamente empregados." Lewis & Rao, The Quarterly Journal of Economics, 2015

O caso eBay: quando a verdade é zero

Em 2015, Thomas Blake, Chris Nosko e Steven Tadelis publicaram na Econometrica um estudo que embaraçou a indústria de search marketing. Os três economistas convenceram o eBay a desligar anúncios de busca paga para termos de marca (como "eBay", "eBay electronics") em regiões aleatórias dos Estados Unidos.

A hipótese do mercado: anúncios de busca de marca geram vendas. A realidade experimental: para consumidores frequentes, o impacto incremental foi estatisticamente indistinguível de zero. As pessoas que buscavam "eBay" no Google iam para o eBay de qualquer forma, clicando no resultado orgânico logo abaixo do anúncio pago.

O eBay estava, em essência, pagando ao Google para capturar tráfego que já era seu. O modelo de atribuição de último clique atribuía cada venda ao anúncio de busca, criando a ilusão de eficácia. Quando o anúncio desapareceu, a venda continuou.

O que o estudo do eBay revelou

Para consumidores frequentes, desligar anúncios de busca para termos de marca produziu impacto zero em vendas. A atribuição de último clique inflava artificialmente o ROI reportado. A lição: sem grupo de controle real, métricas de plataforma medem captura de crédito, nao geração de demanda.

Para anúncios genéricos (termos sem a marca eBay), o resultado foi ligeiramente diferente. Houve algum efeito incremental para consumidores novos ou esporádicos. Mas mesmo nesses casos, o retorno ficou muito abaixo do que os modelos de atribuição sugeriam.

Por que o A/B test padrão falha na mensuração de incrementalidade

O teste A/B foi desenhado para otimização de produto, onde a unidade experimental (o usuário) recebe uma experiência diferente em ambiente controlado. No mundo de software, funciona bem. No mundo de publicidade, cinco problemas estruturais aparecem.

1. Ausência de holdout verdadeiro

Um teste A/B entre criativos compara versão A contra versão B. Para medir incrementalidade, você precisa de versão A contra nada. Um grupo que recebe o anúncio e outro grupo que recebe silêncio total. A maioria dos testes nunca inclui esse grupo de controle puro.

2. Violação de SUTVA

Kohavi, Tang e Xu documentaram em Trustworthy Online Controlled Experiments (2020) que a premissa SUTVA (Stable Unit Treatment Value Assumption) exige que unidades experimentais sejam independentes. Em publicidade, isso raramente acontece. Uma pessoa no grupo de tratamento comenta o anúncio para alguém no grupo de controle. Um membro da família vê o anúncio e influencia a decisão de compra de outro. O efeito contamina os grupos.

3. Escala temporal incompatível

Publicidade de marca opera em ciclos longos. Marketing Mix Modeling e a pesquisa de Binet e Field demonstram que brand building gera seus efeitos maiores entre 6 meses e 3 anos após a exposição. Um teste A/B de duas semanas captura, na melhor hipótese, efeitos de ativação de curto prazo. O impacto incremental real de construção de marca permanece invisível.

4. Problema de poder estatístico

Lewis e Rao calcularam que, com taxas de conversão típicas (1-3%) e efeitos de publicidade pequenos (0,1-0,5% de lift incremental), atingir 80% de poder estatístico requer amostras que a maioria das empresas simplesmente nao possui. Rodar o teste com amostra insuficiente produz resultados com intervalos de confiança tão amplos que incluem tanto "a campanha dobrou vendas" quanto "a campanha reduziu vendas". Ambas as conclusões cabem nos dados.

5. Efeitos de seleção

Plataformas de anúncios otimizam entrega para quem tem maior probabilidade de converter. Isso cria viés de seleção. As pessoas que veem o anúncio já eram propensas a comprar. O anúncio recebe crédito por vendas que teriam acontecido sem ele. Medir incrementalidade exige separar o efeito da seleção do efeito da exposição.

≈ 0 Impacto incremental de anúncios de busca de marca para consumidores frequentes do eBay, segundo Blake, Nosko e Tadelis (2015), publicado na Econometrica

Três métodos que medem o que o A/B test ignora

Ghost Ads: o contrafactual invisível

Johnson, Lewis e Nubbemeyer propuseram em 2017 o conceito de Ghost Ads. O método funciona assim: quando um leilão de anúncios determina que um usuário do grupo de controle teria visto o anúncio vencedor, o sistema registra esse evento fantasma sem exibir nada. Compara-se então o comportamento de quem viu o anúncio real contra quem teria visto, mas recebeu outro conteúdo.

A vantagem sobre PSA tests (onde o controle vê um anúncio genérico de serviço público) é a precisão do contrafactual. Ghost Ads eliminam o custo de comprar impressões para o grupo de controle e reduzem o ruído que Lewis e Rao identificaram. DoorDash implementou a metodologia em 2025 para medir incrementalidade de restaurantes na plataforma, reportando intervalos de confiança significativamente mais estreitos.

Geo-experiments: a geografia como randomização

Vaver e Koehler, pesquisadores do Google, formalizaram em 2011 os geo-experiments. A lógica: dividir regiões geográficas em grupo de teste (recebe campanha) e grupo de controle (silêncio publicitário). Medir a diferença em vendas entre as regiões.

A vantagem: contorna o problema de SUTVA ao nível individual. Pessoas em cidades diferentes raramente contaminam o comportamento umas das outras. A desvantagem: requer volume alto por região e pode ser confundida por fatores locais (eventos, clima, competição regional). Chen, Longfils e Remy do Google refinaram o método em 2021 com "Trimmed Match Design", que pareia regiões semelhantes antes de randomizar e remove outliers para produzir estimativas mais robustas.

Marketing Mix Modeling: a visão sistêmica

Onde experimentos isolam um canal, Marketing Mix Modeling (MMM) modela o sistema inteiro. Usando dados agregados de vendas, investimento em mídia, sazonalidade, preço e distribuição, modelos econométricos estimam a contribuição incremental de cada canal.

Google lançou o Meridian em 2024. Meta mantém o Robyn desde 2022. Ambos open-source. Analytic Partners, com seu ROI Genome baseado em mais de duas décadas de dados cross-industry, produz benchmarks que permitem calibrar modelos individuais contra padrões de mercado. A combinação de MMM com experimentos pontuais de incrementalidade (para calibração) representa, segundo Lewis e Rao, "o quadro mais completo de eficácia publicitária".

A triangulação da mensuração

Nenhum método sozinho resolve o problema. A prática mais avançada combina três camadas: MMM para visão sistêmica e alocação orçamentária, experimentos de incrementalidade (Ghost Ads ou geo-experiments) para calibrar o modelo e validar canais específicos, e atribuição apenas para otimização tática dentro de canais já validados. Quem tenta resolver tudo com uma camada só termina com decisões baseadas em ruído.

O que isso significa para decisões de orçamento

Binet e Field documentaram nos dados do IPA Effectiveness Awards que a proporção ideal entre brand building e ativação gira em torno de 60/40. O problema: a maioria dos modelos de atribuição atribui retorno quase exclusivamente à ativação de curto prazo, porque o efeito incremental de brand building opera em janelas temporais que nenhum teste A/B de duas semanas captura.

O resultado previsível: orçamentos migram para performance. Brand building perde verba. Penetração cai lentamente. E quando os efeitos aparecem, dois anos depois, ninguém conecta causa e consequência. Byron Sharp, do Ehrenberg-Bass Institute, demonstrou que crescimento de marca depende de aumentar penetração entre compradores leves e ocasionais. Esses compradores são os mais difíceis de rastrear com atribuição digital e os mais sensíveis a efeitos de publicidade de longo prazo.

A ilusão do dashboard fica completa: o que se consegue medir facilmente (cliques, conversões de curto prazo) recebe todo o crédito e todo o orçamento. O que funciona de verdade (construção de disponibilidade mental em escala) permanece sem verba porque o teste A/B de duas semanas declarou impacto "inconclusivo".

Cinco regras para medir o que importa

Checklist de incrementalidade

Sempre inclua um holdout puro. Qualquer teste de eficácia publicitária precisa de um grupo que receba zero exposição ao canal sendo medido. Sem holdout, você mede otimização, e chama de eficácia.
Calcule poder estatístico antes de rodar. Defina o efeito mínimo detectável que seria relevante para o negócio. Calcule a amostra necessária. Se a amostra necessária excede o que você possui, aceite a limitação e use métodos complementares.
Separe otimização de mensuração. Use testes A/B para otimizar criativos e landing pages dentro de canais cuja incrementalidade já foi estabelecida por outros métodos. Use experimentos de incrementalidade ou MMM para decidir se o canal merece investimento.
Triangule métodos. MMM para alocação. Geo-experiments ou lift studies para calibração. Atribuição para otimização tática. Os três juntos. Separados, cada um mente de formas diferentes.
Respeite as janelas temporais. Brand building gera efeitos em meses e anos. Medir em semanas produz falsos negativos que assassinam orçamentos de marca. Rode holdouts de 8-12 semanas no mínimo, e complemente com MMM para efeitos de longo prazo.

A mensuração como vantagem competitiva

Empresas que dominam incrementalidade fazem algo que concorrentes consideram impossível: cortam investimento em canais que parecem funcionar (mas geram retorno incremental zero) e redirecionam para canais que parecem invisíveis (mas geram toda a demanda nova).

O eBay descobriu que estava gastando milhões em anúncios de marca no Google que capturavam tráfego orgânico existente. DoorDash implementou Ghost Ads para separar receita incremental real de receita que seria gerada de qualquer forma. Google e Meta publicaram ferramentas open-source de MMM porque sabem que mensuração avançada beneficia anunciantes que investem mais, ao demonstrar com precisão onde cada real produz retorno genuíno.

O teste A/B continua sendo ferramenta valiosa para o que foi desenhado para fazer: comparar variantes em ambiente controlado. O erro está em pedir dele respostas que ele nunca prometeu dar. Perguntar "qual criativo performa melhor?" é legítimo. Perguntar "quanto essa campanha gerou de vendas que inexistiriam sem ela?" exige outro instrumental.

A pergunta mais cara do marketing permanece: quanto do que aconteceu teria acontecido sem nós? Respondê-la com precisão exige mais do que um painel de controle bonito e um teste de duas semanas. Exige método, escala e paciência para investir na infraestrutura de mensuração antes de investir em mídia.

Referências

Estudos acadêmicos

Lewis, R.A., Rao, J.M. (2015). "The Unfavorable Economics of Measuring the Returns to Advertising." The Quarterly Journal of Economics, 130(4), 1941-1973.
Blake, T., Nosko, C., Tadelis, S. (2015). "Consumer Heterogeneity and Paid Search Effectiveness: A Large-Scale Field Experiment." Econometrica, 83(1), 155-174.
Johnson, G., Lewis, R.A., Nubbemeyer, E. (2017). "Ghost Ads: Improving the Economics of Measuring Online Ad Effectiveness." Journal of Marketing Research, 54(6), 867-884.
Vaver, J., Koehler, J. (2011). "Measuring Ad Effectiveness Using Geo Experiments." Google Research.
Chen, A., Longfils, M., Remy, N. (2021). "Trimmed Match Design for Randomized Paired Geo Experiments." Google Research.
Kohavi, R., Tang, D., Xu, Y. (2020). "Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing." Cambridge University Press.

Frameworks e dados de mercado

Binet, L., Field, P. (2013). "The Long and the Short of It." IPA/WARC.
Sharp, B. (2010). "How Brands Grow." Oxford University Press.
Analytic Partners. "ROI Genome Intelligence Report." analyticpartners.com.
Google. "Meridian: Open-Source Marketing Mix Model." (2024).
Meta. "Robyn: Automated Marketing Mix Modeling." (2022).

Leitura complementar

Diego Isaac escreve sobre a ciência por trás das decisões de marca. Estrategista com base em evidência, pesquisa acadêmica e dados de mercado.

Beyond Advertising

TL;DR

A/B tests measure variations between versions. Incrementality measures whether a marketing action generated results that, without it, would simply not exist. Confusing these two concepts leads companies to optimize creatives for campaigns that generate zero incremental sales. Lewis and Rao demonstrated in 2015 that the effect of advertising on sales operates at such a small scale compared to the natural noise in purchasing behavior that reliable experiments require enormous samples. Meanwhile, Blake, Nosko, and Tadelis proved at eBay that branded search ads yield incremental returns statistically indistinguishable from zero for frequent buyers. The incrementality problem demands new methods. Ghost Ads, geo-experiments, and Marketing Mix Modeling offer real paths forward.

The test everyone runs and almost nobody interprets

An e-commerce company decides to test two versions of an ad on Meta. Version A with a blue background, version B with red. They run the test for two weeks. Version B generates 14% more clicks. Celebration. Report. New creative approved.

Three questions nobody asked. How many of those people would have purchased anyway, without seeing any ad? Is the measured effect real or statistical noise? If the entire budget for this campaign were cut, what would happen to total sales?

These questions define the concept of incrementality. And the traditional A/B test, by design, ignores all of them.

An A/B test compares relative performance between variants. It answers the question: "Does A or B work better?" What it ignores: whether both variants produce results superior to the scenario with no intervention at all. Optimizing between variations of something that produces zero incremental impact creates the illusion of progress.

10M+ Person-weeks required for a statistically reliable advertising experiment, according to Lewis and Rao (2015), published in the Quarterly Journal of Economics

The unfavorable economics of measurement

In 2015, Randall Lewis and Justin Rao published in the Quarterly Journal of Economics a study that should have reshaped the entire marketing measurement industry. The title said it all: "The Unfavorable Economics of Measuring the Returns to Advertising."

The core argument: the causal effect of advertising on sales tends to be small when compared to the natural variance in consumer behavior. In less technical language, people buy and stop buying for reasons that transcend any campaign. Paycheck arrived. Refrigerator broke. A friend recommended something. A competitor's offer appeared. The background noise of human behavior is enormous.

When the signal (advertising effect) is small and the noise (behavioral variance) is large, detecting that signal requires absurdly large samples. Lewis and Rao calculated that informative advertising experiments frequently require more than 10 million person-weeks. To put this in context, that means exposing millions of people to controlled conditions for weeks.

Most companies run tests with samples hundreds or thousands of times smaller. The result: statistically insignificant conclusions treated as evidence, confirmation bias reinforcing preexisting beliefs, and billion-dollar budget decisions based on noise.

"Despite these unfavorable economics, randomized control trials represent progress by injecting new, unbiased information into the market. The inference challenges revealed in the field experiments also show that selection bias, due to the targeted nature of advertising, is a crippling concern for widely employed observational methods." Lewis & Rao, The Quarterly Journal of Economics, 2015

The eBay case: when the truth is zero

In 2015, Thomas Blake, Chris Nosko, and Steven Tadelis published in Econometrica a study that embarrassed the search marketing industry. The three economists convinced eBay to turn off paid search ads for branded terms (like "eBay," "eBay electronics") in random U.S. regions.

The market hypothesis: branded search ads drive sales. The experimental reality: for frequent buyers, incremental impact was statistically indistinguishable from zero. People searching for "eBay" on Google went to eBay anyway, clicking on the organic result just below the paid ad.

eBay was essentially paying Google to capture traffic that was already theirs. The last-click attribution model credited every sale to the search ad, creating an illusion of effectiveness. When the ad disappeared, sales continued.

What the eBay study revealed

For frequent buyers, turning off branded search ads produced zero impact on sales. Last-click attribution artificially inflated reported ROI. The lesson: without a real control group, platform metrics measure credit capture, not demand generation.

For generic ads (non-branded terms), results were slightly different. There was some incremental effect for new or occasional buyers. But even in these cases, returns fell well below what attribution models suggested.

Why standard A/B testing fails at incrementality measurement

A/B testing was designed for product optimization, where the experimental unit (the user) receives a different experience in a controlled environment. In the software world, it works well. In advertising, five structural problems emerge.

1. No true holdout

An A/B test between creatives compares version A against version B. To measure incrementality, you need version A against nothing. One group that sees the ad and another group that receives total silence. Most tests never include this pure control group.

2. SUTVA violation

Kohavi, Tang, and Xu documented in Trustworthy Online Controlled Experiments (2020) that the SUTVA assumption (Stable Unit Treatment Value Assumption) requires experimental units to be independent. In advertising, this rarely happens. Someone in the treatment group mentions the ad to someone in the control group. A family member sees the ad and influences another's purchase decision. The effect contaminates the groups.

3. Incompatible time scales

Brand advertising operates on long cycles. Marketing Mix Modeling and Binet and Field's research show that brand building produces its greatest effects between 6 months and 3 years after exposure. A two-week A/B test captures, at best, short-term activation effects. The real incremental impact of brand building remains invisible.

4. Statistical power problem

Lewis and Rao calculated that, with typical conversion rates (1-3%) and small advertising effects (0.1-0.5% incremental lift), achieving 80% statistical power requires samples that most companies simply do not have. Running a test with insufficient sample size produces results with confidence intervals so wide they include both "the campaign doubled sales" and "the campaign reduced sales." Both conclusions fit the data.

5. Selection effects

Ad platforms optimize delivery to people most likely to convert. This creates selection bias. The people who see the ad were already predisposed to buy. The ad receives credit for sales that would have happened without it. Measuring incrementality requires separating the selection effect from the exposure effect.

≈ 0 Incremental impact of branded search ads for frequent eBay buyers, according to Blake, Nosko, and Tadelis (2015), published in Econometrica

Three methods that measure what A/B tests miss

Ghost Ads: the invisible counterfactual

Johnson, Lewis, and Nubbemeyer proposed the Ghost Ads concept in 2017. The method works like this: when an ad auction determines that a control group user would have seen the winning ad, the system logs this phantom event without displaying anything. It then compares the behavior of those who saw the real ad against those who would have seen it but received different content.

The advantage over PSA tests (where the control sees a generic public service ad) is counterfactual precision. Ghost Ads eliminate the cost of buying impressions for the control group and reduce the noise that Lewis and Rao identified. DoorDash implemented the methodology in 2025 to measure restaurant incrementality on their platform, reporting significantly tighter confidence intervals.

Geo-experiments: geography as randomization

Vaver and Koehler, Google researchers, formalized geo-experiments in 2011. The logic: divide geographic regions into test groups (receive campaign) and control groups (advertising silence). Measure the difference in sales between regions.

The advantage: circumvents the individual-level SUTVA problem. People in different cities rarely contaminate each other's behavior. The disadvantage: requires high volume per region and can be confounded by local factors (events, weather, regional competition). Chen, Longfils, and Remy at Google refined the method in 2021 with "Trimmed Match Design," which pairs similar regions before randomization and removes outliers to produce more robust estimates.

Marketing Mix Modeling: the systemic view

Where experiments isolate a single channel, Marketing Mix Modeling (MMM) models the entire system. Using aggregate data on sales, media investment, seasonality, pricing, and distribution, econometric models estimate each channel's incremental contribution.

Google launched Meridian in 2024. Meta has maintained Robyn since 2022. Both open-source. Analytic Partners, with their ROI Genome based on over two decades of cross-industry data, produces benchmarks that allow individual models to be calibrated against market patterns. The combination of MMM with targeted incrementality experiments (for calibration) represents, according to Lewis and Rao, "the most complete picture of advertising effectiveness."

The measurement triangle

No single method solves the problem. The most advanced practice combines three layers: MMM for systemic vision and budget allocation, incrementality experiments (Ghost Ads or geo-experiments) for model calibration and specific channel validation, and attribution only for tactical optimization within channels already validated. Anyone trying to solve everything with one layer ends up with decisions based on noise.

What this means for budget decisions

Binet and Field documented in IPA Effectiveness Awards data that the optimal ratio between brand building and activation hovers around 60/40. The problem: most attribution models assign returns almost exclusively to short-term activation, because brand building's incremental effect operates in time windows that no two-week A/B test captures.

The predictable result: budgets migrate toward performance. Brand building loses funding. Penetration slowly declines. And when the effects appear two years later, nobody connects cause and consequence. Byron Sharp, of the Ehrenberg-Bass Institute, demonstrated that brand growth depends on increasing penetration among light and occasional buyers. These buyers are the hardest to track with digital attribution and the most sensitive to long-term advertising effects.

The dashboard illusion becomes complete: what can be easily measured (clicks, short-term conversions) receives all the credit and all the budget. What actually works (building mental availability at scale) remains unfunded because a two-week A/B test declared its impact "inconclusive."

Five rules for measuring what matters

Incrementality checklist

Always include a pure holdout. Any test of advertising effectiveness needs a group that receives zero exposure to the channel being measured. Without a holdout, you are measuring optimization and calling it effectiveness.
Calculate statistical power before running. Define the minimum detectable effect that would be relevant to the business. Calculate the required sample size. If the required sample exceeds what you have, accept the limitation and use complementary methods.
Separate optimization from measurement. Use A/B tests to optimize creatives and landing pages within channels whose incrementality has already been established by other methods. Use incrementality experiments or MMM to decide whether a channel deserves investment.
Triangulate methods. MMM for allocation. Geo-experiments or lift studies for calibration. Attribution for tactical optimization. All three together. Separately, each one lies in different ways.
Respect the time windows. Brand building generates effects over months and years. Measuring in weeks produces false negatives that murder brand budgets. Run holdouts of 8-12 weeks minimum, and complement with MMM for long-term effects.

Measurement as competitive advantage

Companies that master incrementality do something competitors consider impossible: they cut investment in channels that appear to work (but generate zero incremental return) and redirect toward channels that seem invisible (but generate all new demand).

eBay discovered it was spending millions on branded Google ads that captured existing organic traffic. DoorDash implemented Ghost Ads to separate real incremental revenue from revenue that would have been generated anyway. Google and Meta published open-source MMM tools because they know that advanced measurement benefits advertisers who invest more, by precisely demonstrating where each dollar produces genuine return.

A/B testing remains a valuable tool for what it was designed to do: compare variants in a controlled environment. The error lies in asking it for answers it never promised to give. Asking "which creative performs better?" is legitimate. Asking "how much did this campaign generate in sales that would not exist without it?" requires different instruments.

The most expensive question in marketing endures: how much of what happened would have happened without us? Answering it precisely requires more than a pretty dashboard and a two-week test. It requires method, scale, and the patience to invest in measurement infrastructure before investing in media.

References

Academic studies

Lewis, R.A., Rao, J.M. (2015). "The Unfavorable Economics of Measuring the Returns to Advertising." The Quarterly Journal of Economics, 130(4), 1941-1973.
Blake, T., Nosko, C., Tadelis, S. (2015). "Consumer Heterogeneity and Paid Search Effectiveness: A Large-Scale Field Experiment." Econometrica, 83(1), 155-174.
Johnson, G., Lewis, R.A., Nubbemeyer, E. (2017). "Ghost Ads: Improving the Economics of Measuring Online Ad Effectiveness." Journal of Marketing Research, 54(6), 867-884.
Vaver, J., Koehler, J. (2011). "Measuring Ad Effectiveness Using Geo Experiments." Google Research.
Chen, A., Longfils, M., Remy, N. (2021). "Trimmed Match Design for Randomized Paired Geo Experiments." Google Research.
Kohavi, R., Tang, D., Xu, Y. (2020). "Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing." Cambridge University Press.

Frameworks and market data

Binet, L., Field, P. (2013). "The Long and the Short of It." IPA/WARC.
Sharp, B. (2010). "How Brands Grow." Oxford University Press.
Analytic Partners. "ROI Genome Intelligence Report." analyticpartners.com.
Google. "Meridian: Open-Source Marketing Mix Model." (2024).
Meta. "Robyn: Automated Marketing Mix Modeling." (2022).