{
  "_meta": {
    "schema": "https://www.drawdecisiontree.com/decision-dag.schema.json",
    "source": "https://www.drawdecisiontree.com",
    "description": "DrawDecisionTree.com is a free tool for building, sharing, and embedding interactive decision trees. This file is the machine-readable export of a published decision tree. The `dsl` field contains the original source in the Decision DAG DSL; the `dag` schema is documented at the URL in `schema` above.",
    "links": {
      "interactive": "https://www.drawdecisiontree.com/t/drawdecisiontree/container-orchestration.html",
      "embed": "https://www.drawdecisiontree.com/embed/path/drawdecisiontree/container-orchestration",
      "dsl_reference": "https://www.drawdecisiontree.com/decision-tree-dsl-reference.html",
      "guides": "https://www.drawdecisiontree.com/guides",
      "schema_docs": "https://www.drawdecisiontree.com/decision-dag.schema.json",
      "author_trees": "https://www.drawdecisiontree.com/trees/drawdecisiontree"
    },
    "generated_at": "2026-05-29T12:05:39.257Z"
  },
  "author": {
    "handle": "drawdecisiontree",
    "first_name": "Andrew",
    "last_name": null,
    "avatar_url": "1d32d828-b6ca-40ec-bdd7-771fe7b9c36a/avatar-1778531481027.svg",
    "display_name": "Andrew"
  },
  "file": {
    "id": "30e14637-fe26-4bff-9a75-532fb237ecac",
    "name": "Container Orchestration Platform Selection",
    "public_slug": "container-orchestration",
    "updated_at": "2026-05-12T16:53:43.587978+00:00",
    "url": "https://www.drawdecisiontree.com/t/drawdecisiontree/container-orchestration.html",
    "json_url": "https://www.drawdecisiontree.com/t/drawdecisiontree/container-orchestration/tree.json",
    "dsl_url": "https://www.drawdecisiontree.com/t/drawdecisiontree/container-orchestration/tree.dag"
  },
  "meta": {
    "description": "Container orchestration is foundational infrastructure — the platform you choose shapes how you deploy, scale, network, and operate every service you run. This tree eliminates options that don't match your operational maturity, cloud provider commitment, and workload complexity, so you land on the platform that fits your team today without over-engineering for a scale you haven't reached.",
    "mode": "elimination",
    "entry": "Q1",
    "tags": [
      "devops",
      "containers",
      "kubernetes",
      "infrastructure",
      "cloud",
      "engineering"
    ],
    "image": "https://images.unsplash.com/photo-1605745341112-85968b19335b?w=1200&q=80"
  },
  "questions": [
    {
      "id": "Q1",
      "text": "Does your team have Kubernetes expertise or the capacity to build it — a platform engineer, SRE, or willingness to invest several months in operational learning?"
    },
    {
      "id": "A",
      "text": "YES — dedicated platform expertise or strong willingness to invest [EKS, GKE, AKS, SELF-K8S]"
    },
    {
      "id": "B",
      "text": "NO — small team, want simplicity, focus is on product not platform [ECS, FLY, RENDER]"
    },
    {
      "id": "Q2",
      "text": "Are you committed to a specific cloud provider?"
    },
    {
      "id": "A",
      "text": "AWS [EKS, SELF-K8S]"
    },
    {
      "id": "B",
      "text": "GCP [GKE, SELF-K8S]"
    },
    {
      "id": "C",
      "text": "Azure [AKS, SELF-K8S]"
    },
    {
      "id": "D",
      "text": "Multi-cloud or on-premise [SELF-K8S]"
    },
    {
      "id": "Q3",
      "text": "Do you need to run containers on-premise or in an air-gapped environment?"
    },
    {
      "id": "A",
      "text": "YES — must run on our own infrastructure [SELF-K8S]"
    },
    {
      "id": "B",
      "text": "NO — cloud-hosted is fine [ECS, FLY, RENDER]"
    },
    {
      "id": "Q4",
      "text": "Do you need to run stateful workloads — databases, message brokers, or other services with persistent storage requirements — on the same platform?"
    },
    {
      "id": "A",
      "text": "YES — significant stateful workloads [ECS]"
    },
    {
      "id": "B",
      "text": "NO — primarily stateless applications [FLY, RENDER, ECS]"
    }
  ],
  "outcomes": [
    {
      "id": "EKS",
      "label": "Amazon EKS"
    },
    {
      "id": "GKE",
      "label": "Google Kubernetes Engine"
    },
    {
      "id": "AKS",
      "label": "Azure Kubernetes Service"
    },
    {
      "id": "ECS",
      "label": "Amazon ECS"
    },
    {
      "id": "FLY",
      "label": "Fly.io"
    },
    {
      "id": "RENDER",
      "label": "Render"
    }
  ],
  "dsl": "dag: Container Orchestration Platform Selection\nversion: 1.0.0\nimage: https://images.unsplash.com/photo-1605745341112-85968b19335b?w=1200&q=80\ndescription: Container orchestration is foundational infrastructure — the platform you choose shapes how you deploy, scale, network, and operate every service you run. This tree eliminates options that don't match your operational maturity, cloud provider commitment, and workload complexity, so you land on the platform that fits your team today without over-engineering for a scale you haven't reached.\ntags: devops, containers, kubernetes, infrastructure, cloud, engineering\nentry: Q1\nmode: elimination\n\nQ1: Does your team have Kubernetes expertise or the capacity to build it — a platform engineer, SRE, or willingness to invest several months in operational learning?\n  hint: Kubernetes is the most powerful option but also the most operationally demanding. Concepts like pod disruption budgets, ingress controllers, RBAC, network policies, and persistent volume claims all require real expertise to configure correctly and debug under pressure. Managed Kubernetes (EKS, GKE, AKS) reduces the control plane burden but not the application-level configuration complexity. If your team is three engineers building a product, Kubernetes is likely over-engineering.\n  A: YES — dedicated platform expertise or strong willingness to invest [EKS, GKE, AKS, SELF-K8S]\n  B: NO — small team, want simplicity, focus is on product not platform [ECS, FLY, RENDER]\n\nQ2: Are you committed to a specific cloud provider?\n  when: Q1=A\n  hint: Managed Kubernetes services (EKS, GKE, AKS) are deeply integrated with their respective cloud ecosystems — IAM, load balancers, storage, and networking are all wired up out of the box. Self-managed Kubernetes (kubeadm, k3s, RKE) is cloud-agnostic and the right choice for on-premise or multi-cloud requirements.\n  A: AWS [EKS, SELF-K8S]\n  B: GCP [GKE, SELF-K8S]\n  C: Azure [AKS, SELF-K8S]\n  D: Multi-cloud or on-premise [SELF-K8S]\n\nQ3: Do you need to run containers on-premise or in an air-gapped environment?\n  when: Q1=B\n  hint: Managed platforms (ECS, Fly.io, Render) are SaaS products — your workloads run on the provider's infrastructure. On-premise or air-gapped requirements rule these out entirely.\n  A: YES — must run on our own infrastructure [SELF-K8S]\n  B: NO — cloud-hosted is fine [ECS, FLY, RENDER]\n\nQ4: Do you need to run stateful workloads — databases, message brokers, or other services with persistent storage requirements — on the same platform?\n  when: Q3=B\n  hint: Fly.io and Render support persistent volumes, but they are better suited to stateless applications. If you are running PostgreSQL, Redis, or Kafka in containers alongside your application, ECS with EBS volumes or a managed database service is a more robust choice. Prefer managed database services (RDS, Cloud SQL) over running databases in containers on simplified platforms.\n  A: YES — significant stateful workloads [ECS]\n  B: NO — primarily stateless applications [FLY, RENDER, ECS]\n\n[EKS]: Amazon EKS\n  color: #FF9900\n  description: Amazon Elastic Kubernetes Service (EKS) is AWS's managed Kubernetes offering — it operates the Kubernetes control plane (API server, etcd, scheduler) as a fully managed service with automatic version upgrades and 99.95% SLA, while you manage worker node groups using EC2 instances or Fargate (serverless nodes). EKS integrates natively with the AWS ecosystem: IAM for pod-level authentication (IRSA), ALB Ingress Controller for load balancing, EBS and EFS CSI drivers for persistent storage, and VPC CNI for pod networking. It is the natural choice for AWS-committed organisations with Kubernetes expertise who need the full control and flexibility of Kubernetes. The trade-off versus GKE is that EKS requires more manual configuration to reach the same level of out-of-the-box functionality — AWS offers more building blocks but requires more assembly than GKE's highly integrated defaults.\n  code: ORCH_EKS\n\n[GKE]: Google Kubernetes Engine\n  color: #4285F4\n  description: Google Kubernetes Engine (GKE) is widely regarded as the most polished managed Kubernetes service — Kubernetes was created at Google, and GKE shows it. Its Autopilot mode provides a fully serverless Kubernetes experience where you define workloads and Google manages node provisioning, scaling, and optimisation; Standard mode gives full control over node pools. GKE integrates seamlessly with Google Cloud: Workload Identity for pod-level IAM, Cloud Load Balancing, Filestore and Persistent Disk for storage, and VPC-native networking. GKE's release channels (Rapid, Regular, Stable) simplify Kubernetes version management. Cloud Run, built on Knative and running on GKE infrastructure, provides a serverless abstraction for containerised workloads that don't need full Kubernetes control. GKE Autopilot is the recommended starting point for most GCP-native teams — you get Kubernetes compatibility without the node management overhead.\n  code: ORCH_GKE\n\n[AKS]: Azure Kubernetes Service\n  color: #0078D4\n  description: Azure Kubernetes Service (AKS) is Microsoft's managed Kubernetes service, tightly integrated with the Azure ecosystem — Azure AD for identity and RBAC, Azure Container Registry, Azure Monitor for logs and metrics, and Azure Disk and Files for persistent storage. AKS manages the control plane at no charge; you pay only for worker node VMs. It is the natural choice for Azure-committed organisations, particularly those using Microsoft 365, Active Directory, and .NET-heavy stacks where Azure AD integration is valuable. AKS has improved significantly in recent versions and is a competitive offering compared to EKS and GKE. Azure's Virtual Machine Scale Sets power node auto-scaling, and the KEDA (Kubernetes Event-driven Autoscaling) project — originated by Microsoft — provides fine-grained scaling based on external event sources like Azure Service Bus, Kafka topics, and HTTP request rate.\n  code: ORCH_AKS\n\n[SELF-K8S]: Self-Managed Kubernetes\n  color: #326CE5\n  description: Self-managed Kubernetes — deployed using kubeadm, k3s (lightweight, ideal for edge or resource-constrained environments), RKE2 (Rancher), or OpenShift — gives complete control over the cluster configuration and is the only option for on-premise, air-gapped, or multi-cloud deployments where managed services are not available. k3s is particularly well-suited to edge deployments, Raspberry Pi clusters, and CI/CD environments where a full Kubernetes distribution would be excessive. OpenShift (Red Hat) adds enterprise features (integrated container registry, developer console, compliance profiles) and enterprise support, making it the standard for large regulated enterprises. The operational investment for self-managed Kubernetes is substantial: etcd backups, control plane high availability, certificate rotation, node OS patching, and Kubernetes version upgrades are all your responsibility. Invest in tooling (Cluster API, Talos Linux, or Rancher) to manage the lifecycle at scale rather than operating nodes manually.\n  code: ORCH_SELF_K8S\n\n[ECS]: Amazon ECS\n  color: #FF9900\n  description: Amazon Elastic Container Service (ECS) is AWS's own container orchestration service — simpler than Kubernetes but deeply integrated with the AWS platform. ECS uses a task definition model (JSON/YAML describing container images, environment, CPU/memory, and networking) and deploys tasks into services with rolling update and blue/green deployment support via CodeDeploy. Fargate launch type eliminates all EC2 management — you specify the task and AWS handles the underlying compute entirely. ECS integrates natively with ALB for load balancing, EFS and EBS for storage, IAM task roles for fine-grained permissions, and CloudWatch for logs and metrics. It is the right choice for AWS-committed teams who find Kubernetes complex for their actual workload — ECS covers the vast majority of container orchestration needs with significantly less operational overhead, and scales to very large workloads. The trade-off is AWS lock-in and fewer ecosystem options compared to Kubernetes, where the CNCF landscape provides portable solutions.\n  code: ORCH_ECS\n\n[FLY]: Fly.io\n  color: #7E22CE\n  description: Fly.io is a developer-focused application platform that runs containers as lightweight VMs (Firecracker microVMs) on bare metal servers distributed across 30+ regions worldwide. Deployment is a single CLI command (`fly deploy`); the platform handles DNS, TLS, load balancing, and geographic routing automatically. Fly.io's anycast network routes users to the nearest available instance, making it an exceptional choice for latency-sensitive global applications. It supports persistent volumes, private networking between applications, and Fly Postgres (a managed Postgres cluster). The platform is designed for developers who want to own their infrastructure cost and behaviour more than Render allows, while avoiding Kubernetes complexity. Fly.io is particularly popular for full-stack applications, Elixir/Phoenix, and applications built with Rails or Django where the framework handles most complexity. Its pricing is usage-based and extremely competitive for small-to-medium applications.\n  code: ORCH_FLY\n\n[RENDER]: Render\n  color: #46E3B7\n  description: Render is the simplest fully managed deployment platform on this list — connect a Git repository, and Render builds and deploys your application automatically on every push with zero configuration. It provides native services for web servers, background workers, cron jobs, PostgreSQL, Redis, and static sites, all managed through a clean dashboard or infrastructure-as-code (render.yaml). Render handles SSL, CDN, autoscaling, and DDoS protection automatically. It is the right choice for small teams, early-stage products, and developers who want to ship quickly without any infrastructure overhead. The trade-off is less control: you cannot customise the underlying infrastructure, networking, or runtime beyond what Render exposes. At larger scale, the cost per instance is higher than running equivalent workloads on EC2 or GKE directly, and the platform's constraints (no custom network topologies, limited configuration options) may become limiting as your requirements grow. Render is an excellent starting point that can serve teams well into significant scale.\n  code: ORCH_RENDER\n"
}