A Scalable Creative Testing Framework for Subscription Apps

Creatives are one of the most powerful levers for growth — and with the speed and possibilities AI offers, their importance has only multiplied. We all know that producing more creatives should increase your odds of finding winners, but there’s a problem.

There’s too much noise. Every other post on LinkedIn introduces a new complex AI system that promises to ‘100x your creative process’. My feed is full of posts bragging about testing ‘thousands of creatives per week’. It sounds impressive, but it raises the question: do you need volume to compete?

Is the solution to automate creative and pump out hundreds of ads at a time? Or is this just the latest AI hype?

In this article, we’ll look at what actually matters in creative testing:

Identifying which creatives move the growth needle
Aligning your ad testing strategy with your budget

Stop producing for the sake of testing — start producing for the sake of scalability

Before you can identify winners and recreate that success, you need to have a good setup that serves your needs, while adapting to budget constraints.

This is my tried-and-tested setup that allows you to scale creative success by allocating more of your budget to winning creatives:

But that’s not the only setup that works for scalability.

If your limitations are tighter, ask yourself these two questions to figure out the best setup for your creative testing:

1. Do I already have winning creatives, or do I have to test from scratch?

This question determines your initial setup. If you’ve already run campaigns and seen what kind of creatives drive good performance, then you can start the new campaigns using these ideas. If not, your setup will have to continuously adapt until you discover the concepts that unlock growth.

2. How much money can I spend per day?

This answer will determine the regions and platforms you can target, the event that you can optimize for, and the number of campaigns, ad groups and creatives that make sense. Before diving into the mathematics of answering this question, let’s take a detour to determine what a winner looks like in terms of metrics.

Three types of ad creatives to identify

As everything in life, ad success isn’t black and white. There’s not just ‘winners’ and ‘losers’, there’s some grey areas.

Broadly speaking, here’s the three types of ad creative you’ll come up against.

Winning creatives are the ones that drastically improve performance in all aspects. They don’t tank when you put more budget on them, and they take much longer to be burned out.

Poorly-performing creatives are pretty self-explanatory: they never perform at the level you need.

Then you have average creatives. This is the grey area — average creatives are those that get some spend and perform decently (though worse than winning ads), but you can’t scale them very aggressively. Average creatives are still important, as they add variety in your ad group and diversify your spend once winning creatives start to flag.

Within this group we also include the false positives and false negatives:

False positives: Perform well when they have low spend, but perform badly when you force the algorithms to spend on them
False negatives: Don’t get spend when among winning creatives, but can perform well when isolated in separate ad groups

TL;DR: the spend/traffic a creative receives is a critical variable that determines the best action to take on that specific ad.

What does a winning creative look like for app growth?

The heart of this article is to layout what a winning creative looks like — and since I’m a believer in educating through real data, here’s real data on the performance a winner brings to your campaign, compared to average creatives.

Impressive, right? With this creative, we reduced the cost-per-event by 65%, while our spend was fully absorbed by them. The creative also performed well in upper funnel metrics like CPI, CTR, hook rate and hold rate.

This is what defines a real winning creative:

They drastically improve performance, not only in your optimization goal, but also engagement-related upper funnel metrics
They do this consistently, for a longer period of time [than other creatives]
They do this with a higher spend*

*In the example above, you see a short time because these winners were moved to isolated ad groups — more on this later!

How to establish a baseline that filters your winning creatives

Once you’ve got your creatives defined, you need to know how to rank their success. This means establishing a baseline for the different ad metrics and KPIs you’ll use to measure success and plan actions for ads. Your baseline KPIs will essentially tell you if your next creative has potential to be a winner.

This baseline should be determined by the winning creative — if you were able to produce a great performance with these, it means you can do it again (if you produce creatives of a similar quality!).

Here’s what I look at, in order from highest to lowest priority:

Customer acquisition cost (CAC) / cost-per-acquisition (CPA): The primary action you’re optimizing for should be your north-star metric — keep this cost at or below your target, based on your business economics
Spend: Winning creatives typically receive 80–95% of daily spend — if a creative receives less than 50% of the budget after two days, treat it as a likely loser or false positive
Install to conversion event: Winning creatives convert to your optimization goal at a faster pace — this metric helps identify why some users install but fail to complete the target action
Cost-per-install (CPI): Doesn’t need to be the lowest, but high-performing creatives usually deliver better-than-average CPI
CTR: Indicates user intent and how effectively the creative captures attention
Install rate: Measures how efficiently users who click go on to install the app — winning creatives should convert better than average creatives
Hook rate: A critical indicator of early potential — normally much higher in winning creatives
Hold rate: Measures how long users stay engaged; though it can fluctuate, it’s a strong signal of creative quality and retention
Installs-per-mille (IPM): Winning creatives tend to consistently drive higher IPM
Ad score: A broad summary of social engagement to identify which creatives drive meaningful interaction — to calculate it: (Reactions x 2) + (Comments x 5) + (Saves/Shares x 10)

💡 A note on CAC/CPA

It’s important to consider which event your campaign is optimizing for. If you’re optimizing for an upper-funnel action like registration or trial start, you might see creatives with an impressively low CPA — but very poor conversion to paid subscription afterward.

This often happens when algorithms over-deliver ads to younger users (ages 18–24) who are curious enough to try the app but rarely subscribe.

In these cases, make sure the audience attracted by that creative aligns with your actual target segment. If not, monitor your conversion rate closely — a downward trend may signal that your best-performing creatives are simply attracting the wrong users.

For example, this is a snapshot of my Meta accounts:

You can check other metrics like frequency, CPMs, cost per 1k accounts reached, or cost per 6s view, but the list above includes the ones I suggest using as a consistent baseline to measure success.

Creative ad optimization for every budget size: three testing frameworks

So, you know what a winner looks like by metrics, but how should you approach creative optimization at each budget stage?

From $0 to $500

So you have less room to test, but you still can make it work. In these cases, I always recommend testing one platform (normally iOS) and one GEO (normally US), while you optimize for your main event (e.g. start trial, or direct purchase if you go with a hard paywall).

Your setup should be very simple: one campaign, and one ad group focused on the main event. If you split ad groups, you won’t generate enough events per day, meaning you won’t finish the learning phase and your performance will tank.

In terms of # of creatives, I always go with eight–10 creatives, distributed depending on the answer to our question above:

If you already had winners from previous experiences, do three–four winning creatives and two–three new concepts you want to test
If you’re starting from scratch, simply invest in the best creatives you can produce

If you run a channel like Meta or TikTok, you’ll know within a couple of days whether the test concepts are winner, since these networks quickly push most spend towards the best performers.

There will be false positives and negatives, but if you push new concepts and they don’t get spend, you can be almost sure they won’t get a better performance than your current top-spending creatives.  

It doesn’t matter if you go from scratch or with pre-existing winners, you must rotate the creatives that don’t spend every two–threee days, otherwise they’ll never get a good performance.

Also rotate winning assets if you start to see their KPIs get worse over time. The same as test concepts can become a winner, a winning asset can become a loser due to ad fatigue. Ultimately, there will always be a slot unlocked for tests, either because the previous test hasn’t spent, or because the winner has ad fatigue.

From $500 to $5,000

This is my favorite budget stage, as it allows you to split ad groups for testing purposes while still controlling performance and spending the majority of your budget on the best assets.

At this stage, you should already know which concepts are the best for your goal.

This is the setup I suggest, since you can launch three test ad groups and two BAUs for each campaign (assuming you run iOS campaigns with SKAN reporting). This allows you to put up to 30-50 creatives every week for testing, alongside up to 20 winners to absorb most of the budget.

You’ll likely see some testing assets get most of the spend, but with a worse CAC/CPA than your BAU assets. In this case, these should all be considered losers — pause them and rotate. (But remember to analyze and compare the engagement metrics, in case you can iterate on the idea and find a real winning asset)  

Note: there’s less ads pictured than I recommend, just for clarity within the image

With this setup, you have the opportunity to double-confirm false positives and negatives with isolated ad groups. It’s normal to get a lot of false positives. In this instance, there’s two possibilities:

If the asset performed better than BAU, create an isolated ad group (if necessary, create a new campaign if you don’t have more space in the existing campaign) and see how it evolves when you put a significant amount of spend on it. If it keeps performing better, it’s a real winner — make the most of it in the isolated ad group!
If it quickly gets a much worse CAC than BAU, it means it was a false positive. As usual, don’t forget to check all the metrics in case the concept has winner potential.

False negatives will start to appear on this stage as well. The best approach is following follow the same strategy: isolate them in a new ad group and wait for one–two days. Typically, you’ll see a poor performance when you force spend on them — if not, you’ve hit gold and identified a genuine false negative that turned into a winner!

$5,000 to infinite

At the higher end of budget, things get messy. You have a huge mass of assets to rotate, double-confirm, and scale all at once.

You’ll likely have different GEOs and multiple campaigns per GEO where you can have four–five BAU ad groups, 10–15 testing groups, and five–10 isolated groups to double-confirm false positives/negatives.

The number of ad groups that work as BAU is really determined by the performance. In my experience, I’ve had accounts where I could have five campaigns with three BAU ad groups on each, and others where my winning creatives ad fatigue meant I had to rotate the winners faster — which obviously limited the number of BAUs.  

This requires a ton of manual work, but also accelerates your creative process proportionally. In these situations what you really need to focus on is ensuring your BAU ad groups have a stable performance, since they’re spending most of the budget.

If you see inclining trends in the CAC of these ad groups, it’s time to shorten the number of BAU ad groups and focus on optimizing their performance before adding more test ad groups. Otherwise, you risk rising CPA/CAC and disrupting the logic behind your setup.

There’s no perfect formula to ad wins

So there you have it — several detailed setups for optimizing your ads, and a breakdown of how to measure success.

However, there’s no one in the world that can give you a perfect setup for every case. Everyone has techniques for winning ads, whether they’re KPI formulas or AI tools — but each app is unique.

Your app and creatives have their own intricacies and idiosyncracies, so don’t be afraid to tweak this strategy to work for you. You might not have budget to double-confirm assets, or maybe your winners perform well longer than average so you can leverage performance without many rotations.

Stop focusing on creative volume, stop chasing winners without understanding them first — start with the basics, and learn along the way. There’s no better teacher than real data, and real experiments.

Forget chasing viral ads — build a creative testing system that scales

Summary

Stop producing for the sake of testing — start producing for the sake of scalability

Three types of ad creatives to identify

What does a winning creative look like for app growth?

How to establish a baseline that filters your winning creatives

Creative ad optimization for every budget size: three testing frameworks

From $0 to $500

From $500 to $5,000

$5,000 to infinite

There’s no perfect formula to ad wins

You might also like

The creative volume trap in Meta ads

The creative testing system that slashed our CAC (and scaled our spend)

Why creative fatigue is killing your ROAS

Share this post

Subscribe: App Growth Advice

Forget chasing viral ads — build a creative testing system that scales

Summary

Stop producing for the sake of testing — start producing for the sake of scalabilityCopy link to this section

Three types of ad creatives to identifyCopy link to this section

What does a winning creative look like for app growth?Copy link to this section

How to establish a baseline that filters your winning creativesCopy link to this section

Creative ad optimization for every budget size: three testing frameworksCopy link to this section

From $0 to $500Copy link to this section

From $500 to $5,000Copy link to this section

$5,000 to infiniteCopy link to this section

There’s no perfect formula to ad winsCopy link to this section

You might also like

The creative volume trap in Meta ads

The creative testing system that slashed our CAC (and scaled our spend)

Why creative fatigue is killing your ROAS

Share this post

Subscribe: App Growth Advice

Stop producing for the sake of testing — start producing for the sake of scalability

Three types of ad creatives to identify

What does a winning creative look like for app growth?

How to establish a baseline that filters your winning creatives

Creative ad optimization for every budget size: three testing frameworks

From $0 to $500

From $500 to $5,000

$5,000 to infinite

There’s no perfect formula to ad wins