A practical guide to canary releases with feature flags

A canary release is when you ship a new version to a small subset of users first, watch for problems, and then expand. The name is from coal mines (the canary dies before the miners notice the gas). It is the single most useful risk-reduction technique in modern deploys.

Feature flags are the cleanest way to do canary releases. Here is the workflow we use.

The basic idea

You have a new version of a feature. Instead of releasing it to everyone at once, you release it to:

Internal users only (you and your team)
Then 1% of external users
Then 10%
Then 50%
Then 100%

At each step, you watch metrics. If they get worse, you roll back by flipping the flag, not by redeploying.

Setting it up

Create a flag with percentage rollout. Start at 0%:

flagify flags create new-checkout-flow --type boolean
flagify flags set new-checkout-flow --env production --rollout 0

Wrap the new code path:

const useNewCheckout = flagify.isEnabled('new-checkout-flow')
return useNewCheckout ? <NewCheckout /> : <LegacyCheckout />

Deploy. The new code is in production but 0% of users see it.

Stage 1: internal users

Before the percentage rollout, you want to test with your own team. Create a segment for internal users (based on email domain, or an internal=true attribute):

Rule: email ends with "@yourcompany.com" → serve true
Otherwise: percentage rollout 0%

Your team gets the new version. External users get the old one. Use it for a day. Break it in new and exciting ways. Fix the bugs.

Stage 2: 1% of external users

Bump the rollout to 1%. This sounds small but for most products it is enough to surface:

Errors that only happen with real user data
Performance problems at scale
Weird integrations with third-party tools
Browser-specific bugs you missed

Let it sit for at least a few hours. Overnight is better if you can wait.

What to watch

The metrics that matter are the ones that matter to users, not to your system. Specifically:

Error rate. If error rate on the new path is higher than the old path, something is wrong. Roll back.
P95 latency. If the new path is slower, investigate. A 10% regression is probably tolerable. A 50% regression is not.
Conversion / success metrics. For a checkout flow, watch conversion rate. For a sign-up form, watch completion rate. These are noisier at low sample sizes, but a 30% drop is usually visible at 1%.
Support tickets. If users in the new bucket are writing in, read what they say.

If you have an observability tool (Datadog, Sentry, Grafana), tag your metrics by flag variant so you can compare side-by-side.

const useNew = flagify.isEnabled('new-checkout-flow')
metrics.increment('checkout.started', {
  variant: useNew ? 'new' : 'legacy',
})

Stage 3 and beyond

If 1% looks good after a few hours, go to 10%. Then 50%. Then 100%. The speed depends on your risk tolerance and how much traffic you get.

For a low-traffic app, you may need to sit at each stage for a full day to collect enough data. For a high-traffic app, a few hours is often enough.

When to roll back

Roll back if:

Error rate on the new path is meaningfully higher than the old path
P95 latency is worse and users are complaining
A bug is blocking a specific workflow for a subset of users
Support volume spikes and the tickets trace back to the new path

Roll back by setting the flag rollout to 0%. No redeploy. No incident meeting. Just flip and figure it out.

flagify flags set new-checkout-flow --env production --rollout 0

After 100%

Once you have been at 100% for a sprint and nothing is on fire, remove the flag from the code. Delete the conditional. Collapse the two paths into one. This is the part teams skip, and it is how you end up with 200 stale flags in your codebase.

Read more on this in feature flag best practices.

When not to canary

Canary doesn’t work well when:

The new code depends on a schema migration that already ran. You can’t rollback by flipping a flag, because the schema is different. (Use expand-migrate-contract patterns instead.)
The new code and old code can’t coexist (e.g., they write to the same table in incompatible ways). You need a different strategy.
The thing you are shipping is a backend job that either runs or doesn’t. Canary is for user-facing code where you can route a percentage.

For those cases, smaller blast radius still matters, but flags alone don’t solve it.

Flagify supports percentage rollouts with deterministic hashing so users consistently see the same variant. See the quick start or read about targeting rules.

Start for free — no credit card required.