Deployment Strategy Selection
Decision tree
devopsdeploymentreliabilitysreinfrastructureengineering
Deployment strategy is one of the highest-leverage engineering decisions for system reliability — the wrong choice turns every release into a stressful event, while the right choice makes deployments routine. This tree routes you to the pattern that matches your infrastructure capabilities, rollback requirements, and confidence level in each release.
Overview
Decision Tree
Start: Can your deployment target run two versions of the application simultaneously with a load balancer routing traffic between them?
yes
- Continues to question: Do you need to validate the new version against a fraction of real production traffic before committing to a full rollout?
no
- Outcome: BIG-BANG
Machine-Readable JSON (Canonical Model)
View JSON
{
"_meta": {
"schema": "https://www.drawdecisiontree.com/decision-dag.schema.json",
"source": "https://www.drawdecisiontree.com",
"description": "DrawDecisionTree.com is a free tool for building, sharing, and embedding interactive decision trees. This file is the machine-readable export of a published decision tree. The `dsl` field contains the original source in the Decision DAG DSL; the `dag` schema is documented at the URL in `schema` above.",
"links": {
"interactive": "https://www.drawdecisiontree.com/t/drawdecisiontree/deployment-strategy.html",
"embed": "https://www.drawdecisiontree.com/embed/path/drawdecisiontree/deployment-strategy",
"dsl_reference": "https://www.drawdecisiontree.com/decision-tree-dsl-reference.html",
"guides": "https://www.drawdecisiontree.com/guides",
"schema_docs": "https://www.drawdecisiontree.com/decision-dag.schema.json",
"author_trees": "https://www.drawdecisiontree.com/trees/drawdecisiontree"
},
"generated_at": "2026-05-29T12:05:39.280Z"
},
"author": {
"handle": "drawdecisiontree",
"first_name": "Andrew",
"last_name": null,
"avatar_url": "1d32d828-b6ca-40ec-bdd7-771fe7b9c36a/avatar-1778531481027.svg",
"display_name": "Andrew"
},
"file": {
"id": "f8a099d2-c276-42da-8856-6f2be9d32f88",
"name": "Deployment Strategy Selection",
"public_slug": "deployment-strategy",
"updated_at": "2026-05-12T16:53:43.587978+00:00",
"url": "https://www.drawdecisiontree.com/t/drawdecisiontree/deployment-strategy.html",
"json_url": "https://www.drawdecisiontree.com/t/drawdecisiontree/deployment-strategy/tree.json",
"dsl_url": "https://www.drawdecisiontree.com/t/drawdecisiontree/deployment-strategy/tree.dag"
},
"meta": {
"description": "Deployment strategy is one of the highest-leverage engineering decisions for system reliability — the wrong choice turns every release into a stressful event, while the right choice makes deployments routine. This tree routes you to the pattern that matches your infrastructure capabilities, rollback requirements, and confidence level in each release.",
"mode": "decision",
"entry": "Q1",
"tags": [
"devops",
"deployment",
"reliability",
"sre",
"infrastructure",
"engineering"
],
"image": "https://images.unsplash.com/photo-1558618666-fcd25c85cd64?w=1200&q=80"
},
"questions": [
{
"id": "Q1",
"text": "Can your deployment target run two versions of the application simultaneously with a load balancer routing traffic between them?"
},
{
"id": "Q2",
"text": "Do you need to validate the new version against a fraction of real production traffic before committing to a full rollout?"
},
{
"id": "Q3",
"text": "Can your infrastructure split traffic by percentage — routing, say, 5% of requests to the new version while 95% continue to the old?"
},
{
"id": "Q4",
"text": "Do you need instant rollback — reverting to the previous version in seconds by switching a load balancer, with no redeployment required?"
}
],
"outcomes": [
{
"id": "CANARY",
"label": "Canary Release"
},
{
"id": "ROLLING",
"label": "Rolling Update"
}
],
"dsl": "dag: Deployment Strategy Selection\nversion: 1.0.0\nimage: https://images.unsplash.com/photo-1558618666-fcd25c85cd64?w=1200&q=80\ndescription: Deployment strategy is one of the highest-leverage engineering decisions for system reliability — the wrong choice turns every release into a stressful event, while the right choice makes deployments routine. This tree routes you to the pattern that matches your infrastructure capabilities, rollback requirements, and confidence level in each release.\ntags: devops, deployment, reliability, sre, infrastructure, engineering\nentry: Q1\n\nQ1: Can your deployment target run two versions of the application simultaneously with a load balancer routing traffic between them?\n hint: Zero-downtime deployments require your infrastructure to support parallel versions — at minimum two instances and a load balancer or service mesh. If you have a single server, shared mutable state between instances (file system, in-process cache), or a database migration that requires both versions to be offline simultaneously, answer No.\n yes -> Q2\n no -> [BIG-BANG]\n\nQ2: Do you need to validate the new version against a fraction of real production traffic before committing to a full rollout?\n hint: Traffic-splitting validation is valuable when the code change carries meaningful risk and real-user behaviour is the only reliable signal — new algorithms, pricing changes, significant UX rewrites, or anything where staging environment testing can't reproduce true production load patterns.\n yes -> Q3\n no -> Q4\n\nQ3: Can your infrastructure split traffic by percentage — routing, say, 5% of requests to the new version while 95% continue to the old?\n hint: Weighted routing requires a feature flag system (LaunchDarkly, Unleash), a service mesh (Istio, Linkerd), or a load balancer that supports traffic splitting (AWS ALB weighted target groups, NGINX split clients). If your infrastructure only supports an all-or-nothing cut-over, answer No.\n yes -> [CANARY]\n no -> [FEATURE-FLAGS]\n\nQ4: Do you need instant rollback — reverting to the previous version in seconds by switching a load balancer, with no redeployment required?\n hint: Blue-green provides the fastest rollback but requires double the infrastructure capacity for the switchover window. If infrastructure cost is a constraint, or if your deployment pipeline is fast enough that redeploying the previous image in 2–5 minutes is acceptable, a rolling update is more cost-efficient.\n yes -> [BLUE-GREEN]\n no -> [ROLLING]\n\n[BLUE-GREEN]: Blue-Green Deployment\n color: #0077CC\n description: Blue-green deployment maintains two identical production environments — blue (currently live) and green (idle) — and releases by switching the load balancer from blue to green. The previous environment stays intact and reachable for an immediate rollback: if problems surface post-switch, a single load balancer or DNS change reverts traffic in seconds without any redeployment. This pattern eliminates deployment risk for end users and makes rollback a trivial operation. The cost is double the infrastructure capacity for the switchover window, and it works cleanly only when database migrations are backwards-compatible — both environments share the same database, so the schema must support both the old and new application version simultaneously. Teams on AWS typically implement this with Elastic Beanstalk environment swaps, ECS blue/green deployments, or ALB weighted target group switching.\n code: DEPLOY_BLUE_GREEN\n\n[CANARY]: Canary Release\n color: #F5A623\n description: A canary release routes a small, controlled percentage of production traffic to the new version while the majority continues hitting the stable version. Metrics — error rate, latency percentiles, business KPIs — are monitored on the canary slice; if they remain within acceptable bounds the percentage is gradually increased (1% → 5% → 25% → 100%) until full promotion. If metrics degrade, the canary is pulled back with minimal user impact. This is the most rigorous deployment pattern for high-traffic systems where even a brief full-scale degradation would be costly. It is the standard approach at companies like Netflix, Google, and Stripe. Canary releases require weighted traffic routing (service mesh, feature flags, or ALB weighted target groups) and automated rollout and rollback triggered by SLO-aligned alerts. The flip side: canary validation extends your deployment window from minutes to hours or days for large changes.\n code: DEPLOY_CANARY\n\n[ROLLING]: Rolling Update\n color: #7B68EE\n description: A rolling update replaces application instances one at a time (or in small batches), ensuring that at least some capacity remains on the old version throughout the deployment. Kubernetes uses rolling updates by default — the deployment controller increments new pods while decrementing old ones, respecting the configured `maxSurge` and `maxUnavailable` budgets. The key advantage over blue-green is that rolling updates require no additional infrastructure capacity — you upgrade in place. The trade-off is a longer window during which old and new code run simultaneously, which requires backwards-compatible APIs and database schemas. Rollback is also slower than blue-green: reverting requires redeploying the previous image rather than simply rerouting traffic. Rolling updates are the right default for most Kubernetes-hosted services where immediate rollback is not a hard requirement.\n code: DEPLOY_ROLLING\n\n[FEATURE-FLAGS]: Feature Flag Release\n color: #27AE60\n description: Feature flag releases decouple deployment from release — the new code ships to all servers at once but is hidden behind a flag, giving product and engineering full control over when and to whom the feature becomes visible. Flags can target specific users, organisations, beta cohorts, or percentage rollouts through a feature management platform (LaunchDarkly, Unleash, Flagsmith, or a homegrown system). This pattern is particularly powerful for gradual rollouts, A/B tests, kill switches, and migrations where you want to reach internal users first, then beta customers, then general availability — all without additional deployments. The principal risk is flag debt: features whose flags are never removed accumulate as conditional code paths that complicate testing and increase cognitive overhead. Establish a convention for flag lifecycle — create, reach GA, remove within one quarter — before adopting this at scale.\n code: DEPLOY_FEATURE_FLAGS\n\n[BIG-BANG]: Big Bang Deployment\n color: #E74C3C\n description: A big bang deployment stops the old version, deploys the new version, and restarts — resulting in a planned maintenance window with user-visible downtime. While often labelled an anti-pattern, it is the correct and practical choice in several contexts: single-server deployments without a load balancer, database schema changes that are not backwards-compatible and require a coordinated cutover, or systems with in-process state that cannot coexist between versions. The key discipline for big bang deployments is thorough preparation: validate in a staging environment that mirrors production, script and test the rollback procedure in advance, communicate the maintenance window to users, and execute using a single idempotent script rather than a sequence of manual steps under time pressure. Schedule during the lowest-traffic window and define a hard go/no-go decision point before the deployment begins.\n code: DEPLOY_BIG_BANG\n"
}DSL Representation
dag: Deployment Strategy Selection
version: 1.0.0
image: https://images.unsplash.com/photo-1558618666-fcd25c85cd64?w=1200&q=80
description: Deployment strategy is one of the highest-leverage engineering decisions for system reliability — the wrong choice turns every release into a stressful event, while the right choice makes deployments routine. This tree routes you to the pattern that matches your infrastructure capabilities, rollback requirements, and confidence level in each release.
tags: devops, deployment, reliability, sre, infrastructure, engineering
entry: Q1
Q1: Can your deployment target run two versions of the application simultaneously with a load balancer routing traffic between them?
hint: Zero-downtime deployments require your infrastructure to support parallel versions — at minimum two instances and a load balancer or service mesh. If you have a single server, shared mutable state between instances (file system, in-process cache), or a database migration that requires both versions to be offline simultaneously, answer No.
yes -> Q2
no -> [BIG-BANG]
Q2: Do you need to validate the new version against a fraction of real production traffic before committing to a full rollout?
hint: Traffic-splitting validation is valuable when the code change carries meaningful risk and real-user behaviour is the only reliable signal — new algorithms, pricing changes, significant UX rewrites, or anything where staging environment testing can't reproduce true production load patterns.
yes -> Q3
no -> Q4
Q3: Can your infrastructure split traffic by percentage — routing, say, 5% of requests to the new version while 95% continue to the old?
hint: Weighted routing requires a feature flag system (LaunchDarkly, Unleash), a service mesh (Istio, Linkerd), or a load balancer that supports traffic splitting (AWS ALB weighted target groups, NGINX split clients). If your infrastructure only supports an all-or-nothing cut-over, answer No.
yes -> [CANARY]
no -> [FEATURE-FLAGS]
Q4: Do you need instant rollback — reverting to the previous version in seconds by switching a load balancer, with no redeployment required?
hint: Blue-green provides the fastest rollback but requires double the infrastructure capacity for the switchover window. If infrastructure cost is a constraint, or if your deployment pipeline is fast enough that redeploying the previous image in 2–5 minutes is acceptable, a rolling update is more cost-efficient.
yes -> [BLUE-GREEN]
no -> [ROLLING]
[BLUE-GREEN]: Blue-Green Deployment
color: #0077CC
description: Blue-green deployment maintains two identical production environments — blue (currently live) and green (idle) — and releases by switching the load balancer from blue to green. The previous environment stays intact and reachable for an immediate rollback: if problems surface post-switch, a single load balancer or DNS change reverts traffic in seconds without any redeployment. This pattern eliminates deployment risk for end users and makes rollback a trivial operation. The cost is double the infrastructure capacity for the switchover window, and it works cleanly only when database migrations are backwards-compatible — both environments share the same database, so the schema must support both the old and new application version simultaneously. Teams on AWS typically implement this with Elastic Beanstalk environment swaps, ECS blue/green deployments, or ALB weighted target group switching.
code: DEPLOY_BLUE_GREEN
[CANARY]: Canary Release
color: #F5A623
description: A canary release routes a small, controlled percentage of production traffic to the new version while the majority continues hitting the stable version. Metrics — error rate, latency percentiles, business KPIs — are monitored on the canary slice; if they remain within acceptable bounds the percentage is gradually increased (1% → 5% → 25% → 100%) until full promotion. If metrics degrade, the canary is pulled back with minimal user impact. This is the most rigorous deployment pattern for high-traffic systems where even a brief full-scale degradation would be costly. It is the standard approach at companies like Netflix, Google, and Stripe. Canary releases require weighted traffic routing (service mesh, feature flags, or ALB weighted target groups) and automated rollout and rollback triggered by SLO-aligned alerts. The flip side: canary validation extends your deployment window from minutes to hours or days for large changes.
code: DEPLOY_CANARY
[ROLLING]: Rolling Update
color: #7B68EE
description: A rolling update replaces application instances one at a time (or in small batches), ensuring that at least some capacity remains on the old version throughout the deployment. Kubernetes uses rolling updates by default — the deployment controller increments new pods while decrementing old ones, respecting the configured `maxSurge` and `maxUnavailable` budgets. The key advantage over blue-green is that rolling updates require no additional infrastructure capacity — you upgrade in place. The trade-off is a longer window during which old and new code run simultaneously, which requires backwards-compatible APIs and database schemas. Rollback is also slower than blue-green: reverting requires redeploying the previous image rather than simply rerouting traffic. Rolling updates are the right default for most Kubernetes-hosted services where immediate rollback is not a hard requirement.
code: DEPLOY_ROLLING
[FEATURE-FLAGS]: Feature Flag Release
color: #27AE60
description: Feature flag releases decouple deployment from release — the new code ships to all servers at once but is hidden behind a flag, giving product and engineering full control over when and to whom the feature becomes visible. Flags can target specific users, organisations, beta cohorts, or percentage rollouts through a feature management platform (LaunchDarkly, Unleash, Flagsmith, or a homegrown system). This pattern is particularly powerful for gradual rollouts, A/B tests, kill switches, and migrations where you want to reach internal users first, then beta customers, then general availability — all without additional deployments. The principal risk is flag debt: features whose flags are never removed accumulate as conditional code paths that complicate testing and increase cognitive overhead. Establish a convention for flag lifecycle — create, reach GA, remove within one quarter — before adopting this at scale.
code: DEPLOY_FEATURE_FLAGS
[BIG-BANG]: Big Bang Deployment
color: #E74C3C
description: A big bang deployment stops the old version, deploys the new version, and restarts — resulting in a planned maintenance window with user-visible downtime. While often labelled an anti-pattern, it is the correct and practical choice in several contexts: single-server deployments without a load balancer, database schema changes that are not backwards-compatible and require a coordinated cutover, or systems with in-process state that cannot coexist between versions. The key discipline for big bang deployments is thorough preparation: validate in a staging environment that mirrors production, script and test the rollback procedure in advance, communicate the maintenance window to users, and execute using a single idempotent script rather than a sequence of manual steps under time pressure. Schedule during the lowest-traffic window and define a hard go/no-go decision point before the deployment begins.
code: DEPLOY_BIG_BANG
Machine Access
- Static JSON:
/t/drawdecisiontree/deployment-strategy/tree.json - Live JSON (SPA):
/json/drawdecisiontree/deployment-strategy - Raw DSL:
/t/drawdecisiontree/deployment-strategy/tree.dag - Canonical HTML:
/t/drawdecisiontree/deployment-strategy.html
Questions in this decision tree
- Can your deployment target run two versions of the application simultaneously with a load balancer routing traffic between them?
- Do you need to validate the new version against a fraction of real production traffic before committing to a full rollout?
- Can your infrastructure split traffic by percentage — routing, say, 5% of requests to the new version while 95% continue to the old?
- Do you need instant rollback — reverting to the previous version in seconds by switching a load balancer, with no redeployment required?
Possible outcomes
- Canary Release
- Rolling Update
How to use this decision tree
Click "Open interactive version" to step through the questions. Your answers narrow the tree until a recommended outcome is reached. You can also embed this tree on your own site.
More decision trees by Andrew
Which API design pattern is right for my project?
Determine the right API design style for your integration scenario.
Authentication Method Selection
Authentication is a security-critical, high-friction decision to reverse — migrating users from one auth method to another requires coordinated password resets or credential migration campaigns. This tree eliminates methods that don't match your user type, enterprise requirements, and security posture, giving you a clear shortlist before you write a line of code.
Caching Strategy Selection
Premature or misapplied caching adds complexity — stale data bugs, invalidation logic, and distributed consistency problems — without solving the actual bottleneck. This tree routes you to the caching pattern that matches your data access profile, so you apply the right tool to the right problem rather than defaulting to Redis for everything.
CI/CD Pipeline Tool Selection
Choosing a CI/CD platform is a long-term infrastructure commitment — pipelines accumulate config, custom scripts, and team muscle memory that make switching painful. This tree eliminates tools that don't fit your source control host, infrastructure model, or team scale, leaving only the options genuinely viable for your situation.
Which cloud provider should I use — AWS, Azure, or Google Cloud?
Answer a few questions to identify the most suitable cloud platform for your workload.
Container Orchestration Platform Selection
Container orchestration is foundational infrastructure — the platform you choose shapes how you deploy, scale, network, and operate every service you run. This tree eliminates options that don't match your operational maturity, cloud provider commitment, and workload complexity, so you land on the platform that fits your team today without over-engineering for a scale you haven't reached.