Kill switches in software: the safety net every production app needs

A kill switch is a special kind of feature flag. It exists not to roll out a new thing, but to turn off an existing thing, instantly, without a deploy. Its job is to be boring. It sits in your codebase for years, does nothing, and then one day saves you at 2am.

Every team I know that runs production systems regrets not having more of them.

What a kill switch looks like

if (!flagify.isEnabled('enable-email-sending', true)) {
  logger.warn('Email sending is disabled via kill switch')
  return { sent: false, skipped: true }
}

await sendEmail(message)

The fallback is true. If the flag service is unreachable, the feature stays on (which is the common case). You only flip it to false when something is actively on fire.

What deserves a kill switch

A useful question: if this system went into a bad state at 2am, how would you stop it from doing damage while you figure out what’s wrong?

If the answer is “roll back a deploy,” you probably want a kill switch. If the answer is “there’s nothing to stop, it’s fine,” you don’t.

Good candidates for kill switches:

Payment processing. Stripe has an incident, your retry loop is generating thousands of duplicate charges. Kill switch.
Outbound email/SMS. A bug causes you to send the same email 50 times. Kill switch.
Third-party integrations. The API you depend on is down and your code is retrying aggressively. Kill switch the feature that calls it.
Expensive features. An LLM call that costs money every time. Kill switch per tenant if abuse is detected.
Feature with a known dependency on a fragile upstream. Kill switch the feature before the fragile thing breaks you.
New or risky database migrations. Kill switch the code path that uses the new schema.

What does not deserve a kill switch

Static rendering. Simple CRUD. Read paths that are cached. Features that are off the happy path. Things that, if they break, degrade gracefully and can wait for a normal rollback.

You can have too many kill switches. Every flag is a branch in your code that must be tested. Aim for maybe a dozen, on the things that can actually hurt you.

Design patterns

Fail-safe default

The fallback value when the flag service is unreachable should be the safer state. For a kill switch, that is usually true (feature on), because your flag service being down shouldn’t take down your payment system. But for a new risky feature, you might want false as the default until the flag service confirms it should be on.

// Kill switch: fail-safe is "on"
const paymentsEnabled = flagify.isEnabled('enable-payments', true)

// Launch flag: fail-safe is "off"
const newFeatureEnabled = flagify.isEnabled('new-risky-feature', false)

Graceful degradation

When the kill switch is flipped, show a user-friendly message instead of an error.

if (!flagify.isEnabled('enable-search', true)) {
  return (
    <SearchUnavailable
      message="Search is temporarily unavailable. We're working on it."
    />
  )
}

Scoped kill switches

Sometimes you don’t want to kill a feature globally, just for one user or one tenant. Use targeting rules:

// Flag targets: tenant in high-cost-tenants
const llmEnabled = flagify.isEnabled('enable-llm-features', true)

You add the misbehaving tenant to the segment, and the flag evaluates to false for just that tenant. Rest of the system continues.

Mistakes we’ve seen

Not having a kill switch on the thing that took down prod last time. Every post-mortem mentions “we should have had a flag for this” and then nobody adds it.

Killing too much at once. You have one flag called enable-api that gates 80% of your code. Now you can’t kill any of it without killing all of it. Split the flag.

Forgetting the kill switch exists. Six months after the launch, nobody remembers that flag is there. When the bad day comes, the on-call engineer flips the wrong thing. Document kill switches in your runbook.

Treating kill switches like release flags. Kill switches are permanent. They do not get cleaned up. Your flag management tool should distinguish them so they don’t show up in “stale flag” alerts.

Not testing the kill switch path. The code path that runs when the flag is off might be years old and broken. Test it. Regularly.

How we use them at Flagify

We have kill switches on: outbound email, SSE broadcast, billing webhooks, LLM calls, and a few expensive analytics queries. Most of them have never been flipped. Three of them saved us from incidents in the last year.

We also have a convention that any new feature that touches a third-party API or costs money per request gets a kill switch before it ships. It is not negotiable.

Feature flags are not just for launches. Read more about feature flag types and lifecycles or feature flag best practices.

Flagify gives you kill switches, gradual rollouts, and targeting in the same tool. Set up your first flag in under five minutes with the quick start guide.

Start for free — no credit card required.