Message Queue and Event Bus Selection

Message Queue and Event Bus Selection

Elimination matrix backendmessagingevent drivenarchitectureinfrastructuredata

Messaging infrastructure is hard to replace once services are built around it — consumer groups, retention policies, and delivery guarantees become load-bearing assumptions embedded in application code. This tree eliminates brokers that don't match your durability, routing, throughput, and operational constraints, so you choose the right foundation before building on it.

Overview

Type
Elimination matrix
Tags
backend, messaging, event driven, architecture, infrastructure, data
Entry
Q1
Questions
13
Outcomes
4
Author
Andrew
Last updated
2026-05-12

Decision Tree

Start: Do you need to replay historical messages — re-read events from a past point in time for audit trails, disaster recovery, or onboarding new consumers to existing event streams?

A: YES — event replay and durable retention are required [KAFKA, REDIS-STREAMS]

B: NO — messages are consumed once and can be discarded after acknowledgement [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]

Machine-Readable JSON (Canonical Model)

View JSON
{
  "_meta": {
    "schema": "https://www.drawdecisiontree.com/decision-dag.schema.json",
    "source": "https://www.drawdecisiontree.com",
    "description": "DrawDecisionTree.com is a free tool for building, sharing, and embedding interactive decision trees. This file is the machine-readable export of a published decision tree. The `dsl` field contains the original source in the Decision DAG DSL; the `dag` schema is documented at the URL in `schema` above.",
    "links": {
      "interactive": "https://www.drawdecisiontree.com/t/drawdecisiontree/message-queue.html",
      "embed": "https://www.drawdecisiontree.com/embed/path/drawdecisiontree/message-queue",
      "dsl_reference": "https://www.drawdecisiontree.com/decision-tree-dsl-reference.html",
      "guides": "https://www.drawdecisiontree.com/guides",
      "schema_docs": "https://www.drawdecisiontree.com/decision-dag.schema.json",
      "author_trees": "https://www.drawdecisiontree.com/trees/drawdecisiontree"
    },
    "generated_at": "2026-05-29T12:05:39.330Z"
  },
  "author": {
    "handle": "drawdecisiontree",
    "first_name": "Andrew",
    "last_name": null,
    "avatar_url": "1d32d828-b6ca-40ec-bdd7-771fe7b9c36a/avatar-1778531481027.svg",
    "display_name": "Andrew"
  },
  "file": {
    "id": "875e7e17-bb6d-4ec4-afca-e5a235c7dee6",
    "name": "Message Queue and Event Bus Selection",
    "public_slug": "message-queue",
    "updated_at": "2026-05-12T16:53:43.587978+00:00",
    "url": "https://www.drawdecisiontree.com/t/drawdecisiontree/message-queue.html",
    "json_url": "https://www.drawdecisiontree.com/t/drawdecisiontree/message-queue/tree.json",
    "dsl_url": "https://www.drawdecisiontree.com/t/drawdecisiontree/message-queue/tree.dag"
  },
  "meta": {
    "description": "Messaging infrastructure is hard to replace once services are built around it — consumer groups, retention policies, and delivery guarantees become load-bearing assumptions embedded in application code. This tree eliminates brokers that don't match your durability, routing, throughput, and operational constraints, so you choose the right foundation before building on it.",
    "mode": "elimination",
    "entry": "Q1",
    "tags": [
      "backend",
      "messaging",
      "event driven",
      "architecture",
      "infrastructure",
      "data"
    ],
    "image": "https://images.unsplash.com/photo-1516321318423-f06f85e504b3?w=1200&q=80"
  },
  "questions": [
    {
      "id": "Q1",
      "text": "Do you need to replay historical messages — re-read events from a past point in time for audit trails, disaster recovery, or onboarding new consumers to existing event streams?"
    },
    {
      "id": "A",
      "text": "YES — event replay and durable retention are required [KAFKA, REDIS-STREAMS]"
    },
    {
      "id": "B",
      "text": "NO — messages are consumed once and can be discarded after acknowledgement [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]"
    },
    {
      "id": "Q2",
      "text": "Are you exclusively on a single cloud provider and want a fully managed, zero-ops message broker with no infrastructure to run?"
    },
    {
      "id": "A",
      "text": "AWS — exclusively [SQS, KAFKA]"
    },
    {
      "id": "B",
      "text": "GCP — exclusively [PUBSUB, KAFKA]"
    },
    {
      "id": "C",
      "text": "Multi-cloud, Azure, on-premise, or no preference [KAFKA, RABBITMQ, REDIS-STREAMS]"
    },
    {
      "id": "Q3",
      "text": "Do you need complex message routing — topic exchanges, dead-letter queues, priority queues, per-message TTL, or consumer acknowledgement with selective requeue?"
    },
    {
      "id": "A",
      "text": "YES — flexible routing topologies, dead-letter handling, or priority queues [RABBITMQ, KAFKA]"
    },
    {
      "id": "B",
      "text": "NO — simple FIFO delivery, fan-out, or point-to-point is sufficient [SQS, PUBSUB, REDIS-STREAMS, KAFKA]"
    },
    {
      "id": "Q4",
      "text": "Do you need to sustain peak ingestion above 100,000 messages per second per topic?"
    },
    {
      "id": "A",
      "text": "YES — extreme throughput, event streaming at internet scale [KAFKA]"
    },
    {
      "id": "B",
      "text": "NO — moderate throughput is sufficient for our workload [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]"
    }
  ],
  "outcomes": [
    {
      "id": "KAFKA",
      "label": "Apache Kafka"
    },
    {
      "id": "RABBITMQ",
      "label": "RabbitMQ"
    },
    {
      "id": "SQS",
      "label": "Amazon SQS"
    },
    {
      "id": "PUBSUB",
      "label": "Google Cloud Pub/Sub"
    }
  ],
  "dsl": "dag: Message Queue and Event Bus Selection\nversion: 1.0.0\nimage: https://images.unsplash.com/photo-1516321318423-f06f85e504b3?w=1200&q=80\ndescription: Messaging infrastructure is hard to replace once services are built around it — consumer groups, retention policies, and delivery guarantees become load-bearing assumptions embedded in application code. This tree eliminates brokers that don't match your durability, routing, throughput, and operational constraints, so you choose the right foundation before building on it.\ntags: backend, messaging, event driven, architecture, infrastructure, data\nentry: Q1\nmode: elimination\n\nQ1: Do you need to replay historical messages — re-read events from a past point in time for audit trails, disaster recovery, or onboarding new consumers to existing event streams?\n  hint: Kafka and Redis Streams retain messages for a configurable duration (or indefinitely). Traditional queues like RabbitMQ and SQS delete messages once they are acknowledged — once consumed, they are gone. If any current or future consumer might need to reprocess history, choose a log-based broker from the start.\n  A: YES — event replay and durable retention are required [KAFKA, REDIS-STREAMS]\n  B: NO — messages are consumed once and can be discarded after acknowledgement [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]\n\nQ2: Are you exclusively on a single cloud provider and want a fully managed, zero-ops message broker with no infrastructure to run?\n  hint: Managed cloud brokers (SQS, Pub/Sub) eliminate all infrastructure overhead — no brokers to patch, scale, or monitor. The trade-off is cloud lock-in and less control over advanced configuration. If your workload spans multiple clouds or runs on-premise, a self-hosted or cloud-agnostic option is safer.\n  A: AWS — exclusively [SQS, KAFKA]\n  B: GCP — exclusively [PUBSUB, KAFKA]\n  C: Multi-cloud, Azure, on-premise, or no preference [KAFKA, RABBITMQ, REDIS-STREAMS]\n\nQ3: Do you need complex message routing — topic exchanges, dead-letter queues, priority queues, per-message TTL, or consumer acknowledgement with selective requeue?\n  hint: RabbitMQ's exchange model (direct, topic, fanout, headers) provides fine-grained routing logic that lets producers send messages without knowing which consumers will receive them. Kafka provides partitioned topics but simpler routing — complex per-message routing patterns are not its strength.\n  A: YES — flexible routing topologies, dead-letter handling, or priority queues [RABBITMQ, KAFKA]\n  B: NO — simple FIFO delivery, fan-out, or point-to-point is sufficient [SQS, PUBSUB, REDIS-STREAMS, KAFKA]\n\nQ4: Do you need to sustain peak ingestion above 100,000 messages per second per topic?\n  hint: At very high throughput, most brokers become bottlenecks before your application does. Kafka's partitioned, append-only log architecture is specifically designed for this scale — it can sustain millions of messages per second by adding partitions and brokers. SQS and Pub/Sub also scale well under managed operation but have per-request cost implications at extreme volume.\n  A: YES — extreme throughput, event streaming at internet scale [KAFKA]\n  B: NO — moderate throughput is sufficient for our workload [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]\n\n[KAFKA]: Apache Kafka\n  color: #231F20\n  description: Apache Kafka is the industry-standard distributed event streaming platform, designed for high-throughput, durable, ordered event logs. Its core abstraction is the partitioned, replicated log — messages are appended sequentially and retained for a configurable duration (hours, days, or indefinitely), allowing any consumer to read from any offset at any time. This makes Kafka the natural choice for event sourcing, audit trails, real-time analytics pipelines, and change data capture (CDC) from databases. Kafka's consumer group model allows multiple independent services to consume the same event stream in parallel without interfering with each other. The operational overhead of self-managed Kafka is real — ZooKeeper (now replaced by KRaft in Kafka 3.x), partition rebalancing, and retention management require expertise. Confluent Cloud and Amazon MSK provide managed Kafka that eliminates most of this overhead while preserving compatibility. Kafka is the right choice at scale, but its complexity is unnecessary for simple task queues or low-throughput integrations.\n  code: MQ_KAFKA\n\n[RABBITMQ]: RabbitMQ\n  color: #FF6600\n  description: RabbitMQ is a battle-tested open-source message broker built around the AMQP protocol. Its exchange-and-queue model provides exceptional routing flexibility — direct exchanges for point-to-point delivery, topic exchanges for pattern-matched routing, fanout exchanges for broadcast, and headers exchanges for attribute-based routing. Dead-letter exchanges, per-message TTL, priority queues, and consumer acknowledgement with selective requeue are all first-class features. RabbitMQ is the right choice for task queues, job scheduling, work distribution across microservices, and any scenario where complex routing logic or delivery guarantees matter more than raw throughput. Its management UI provides real-time visibility into queue depths, consumer lag, and message rates. RabbitMQ is a push-based broker — messages are delivered to consumers as they arrive, unlike Kafka's pull-based model. Quorum queues (Raft-based) provide strong durability guarantees for critical workloads. Operationally, RabbitMQ is significantly simpler to run than Kafka for moderate workloads.\n  code: MQ_RABBITMQ\n\n[SQS]: Amazon SQS\n  color: #FF9900\n  description: Amazon SQS is AWS's fully managed, serverless message queue service — no brokers to provision, patch, or scale. It provides two queue types: Standard queues offer at-least-once delivery with best-effort ordering and virtually unlimited throughput; FIFO queues guarantee exactly-once processing and strict message ordering at up to 3,000 transactions per second. SQS integrates natively with Lambda (event source mappings trigger functions on queue depth), SNS (fan-out to multiple queues), and Step Functions, making it the natural message queue for AWS-native architectures. Dead-letter queue support captures messages that fail processing after a configurable number of retries, enabling debugging without message loss. SQS does not support event replay — messages are deleted after consumption. It is the right choice for AWS-native task queues, decoupling services within an AWS account, and serverless event-driven architectures where operational simplicity is paramount.\n  code: MQ_SQS\n\n[PUBSUB]: Google Cloud Pub/Sub\n  color: #4285F4\n  description: Google Cloud Pub/Sub is a fully managed, globally distributed messaging service designed for real-time event streaming and asynchronous service integration. It provides at-least-once delivery with support for exactly-once delivery in Pub/Sub Lite. Messages can be retained for up to 7 days for replay, and subscriptions can be created after the fact to read from a snapshot. Pub/Sub scales automatically to handle any volume without capacity planning. It integrates natively with Dataflow (stream processing), BigQuery (direct subscription ingest), and Cloud Functions, making it the default messaging layer for GCP-native data pipelines. The pull and push subscription models give consumers flexibility in how they receive messages. Pub/Sub is the right choice for GCP-native architectures, real-time analytics pipelines feeding Dataflow or BigQuery, and event-driven microservices running on Cloud Run or GKE.\n  code: MQ_PUBSUB\n\n[REDIS-STREAMS]: Redis Streams\n  color: #DC382D\n  description: Redis Streams is an append-only log data structure built into Redis, providing many of Kafka's core capabilities — persistent event logs, consumer groups, message acknowledgement, and replayable offsets — with the operational simplicity of a Redis instance you may already be running. Streams support consumer groups (multiple consumers compete to process messages from a stream), pending entry lists (messages acknowledged but not yet confirmed), and XREAD with blocking for low-latency consumption. Message retention is configurable by count or memory size, making it suitable for short-to-medium term event history. Redis Streams is the right choice when you already operate Redis (for caching or session storage) and need event streaming capabilities without introducing Kafka's operational complexity. It is not the right choice for multi-terabyte event archives or scenarios requiring Kafka's partition-level scalability, but for moderate throughput workloads with a team already familiar with Redis, it is an elegant and low-cost solution.\n  code: MQ_REDIS_STREAMS\n"
}

DSL Representation

dag: Message Queue and Event Bus Selection
version: 1.0.0
image: https://images.unsplash.com/photo-1516321318423-f06f85e504b3?w=1200&q=80
description: Messaging infrastructure is hard to replace once services are built around it — consumer groups, retention policies, and delivery guarantees become load-bearing assumptions embedded in application code. This tree eliminates brokers that don't match your durability, routing, throughput, and operational constraints, so you choose the right foundation before building on it.
tags: backend, messaging, event driven, architecture, infrastructure, data
entry: Q1
mode: elimination

Q1: Do you need to replay historical messages — re-read events from a past point in time for audit trails, disaster recovery, or onboarding new consumers to existing event streams?
  hint: Kafka and Redis Streams retain messages for a configurable duration (or indefinitely). Traditional queues like RabbitMQ and SQS delete messages once they are acknowledged — once consumed, they are gone. If any current or future consumer might need to reprocess history, choose a log-based broker from the start.
  A: YES — event replay and durable retention are required [KAFKA, REDIS-STREAMS]
  B: NO — messages are consumed once and can be discarded after acknowledgement [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]

Q2: Are you exclusively on a single cloud provider and want a fully managed, zero-ops message broker with no infrastructure to run?
  hint: Managed cloud brokers (SQS, Pub/Sub) eliminate all infrastructure overhead — no brokers to patch, scale, or monitor. The trade-off is cloud lock-in and less control over advanced configuration. If your workload spans multiple clouds or runs on-premise, a self-hosted or cloud-agnostic option is safer.
  A: AWS — exclusively [SQS, KAFKA]
  B: GCP — exclusively [PUBSUB, KAFKA]
  C: Multi-cloud, Azure, on-premise, or no preference [KAFKA, RABBITMQ, REDIS-STREAMS]

Q3: Do you need complex message routing — topic exchanges, dead-letter queues, priority queues, per-message TTL, or consumer acknowledgement with selective requeue?
  hint: RabbitMQ's exchange model (direct, topic, fanout, headers) provides fine-grained routing logic that lets producers send messages without knowing which consumers will receive them. Kafka provides partitioned topics but simpler routing — complex per-message routing patterns are not its strength.
  A: YES — flexible routing topologies, dead-letter handling, or priority queues [RABBITMQ, KAFKA]
  B: NO — simple FIFO delivery, fan-out, or point-to-point is sufficient [SQS, PUBSUB, REDIS-STREAMS, KAFKA]

Q4: Do you need to sustain peak ingestion above 100,000 messages per second per topic?
  hint: At very high throughput, most brokers become bottlenecks before your application does. Kafka's partitioned, append-only log architecture is specifically designed for this scale — it can sustain millions of messages per second by adding partitions and brokers. SQS and Pub/Sub also scale well under managed operation but have per-request cost implications at extreme volume.
  A: YES — extreme throughput, event streaming at internet scale [KAFKA]
  B: NO — moderate throughput is sufficient for our workload [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]

[KAFKA]: Apache Kafka
  color: #231F20
  description: Apache Kafka is the industry-standard distributed event streaming platform, designed for high-throughput, durable, ordered event logs. Its core abstraction is the partitioned, replicated log — messages are appended sequentially and retained for a configurable duration (hours, days, or indefinitely), allowing any consumer to read from any offset at any time. This makes Kafka the natural choice for event sourcing, audit trails, real-time analytics pipelines, and change data capture (CDC) from databases. Kafka's consumer group model allows multiple independent services to consume the same event stream in parallel without interfering with each other. The operational overhead of self-managed Kafka is real — ZooKeeper (now replaced by KRaft in Kafka 3.x), partition rebalancing, and retention management require expertise. Confluent Cloud and Amazon MSK provide managed Kafka that eliminates most of this overhead while preserving compatibility. Kafka is the right choice at scale, but its complexity is unnecessary for simple task queues or low-throughput integrations.
  code: MQ_KAFKA

[RABBITMQ]: RabbitMQ
  color: #FF6600
  description: RabbitMQ is a battle-tested open-source message broker built around the AMQP protocol. Its exchange-and-queue model provides exceptional routing flexibility — direct exchanges for point-to-point delivery, topic exchanges for pattern-matched routing, fanout exchanges for broadcast, and headers exchanges for attribute-based routing. Dead-letter exchanges, per-message TTL, priority queues, and consumer acknowledgement with selective requeue are all first-class features. RabbitMQ is the right choice for task queues, job scheduling, work distribution across microservices, and any scenario where complex routing logic or delivery guarantees matter more than raw throughput. Its management UI provides real-time visibility into queue depths, consumer lag, and message rates. RabbitMQ is a push-based broker — messages are delivered to consumers as they arrive, unlike Kafka's pull-based model. Quorum queues (Raft-based) provide strong durability guarantees for critical workloads. Operationally, RabbitMQ is significantly simpler to run than Kafka for moderate workloads.
  code: MQ_RABBITMQ

[SQS]: Amazon SQS
  color: #FF9900
  description: Amazon SQS is AWS's fully managed, serverless message queue service — no brokers to provision, patch, or scale. It provides two queue types: Standard queues offer at-least-once delivery with best-effort ordering and virtually unlimited throughput; FIFO queues guarantee exactly-once processing and strict message ordering at up to 3,000 transactions per second. SQS integrates natively with Lambda (event source mappings trigger functions on queue depth), SNS (fan-out to multiple queues), and Step Functions, making it the natural message queue for AWS-native architectures. Dead-letter queue support captures messages that fail processing after a configurable number of retries, enabling debugging without message loss. SQS does not support event replay — messages are deleted after consumption. It is the right choice for AWS-native task queues, decoupling services within an AWS account, and serverless event-driven architectures where operational simplicity is paramount.
  code: MQ_SQS

[PUBSUB]: Google Cloud Pub/Sub
  color: #4285F4
  description: Google Cloud Pub/Sub is a fully managed, globally distributed messaging service designed for real-time event streaming and asynchronous service integration. It provides at-least-once delivery with support for exactly-once delivery in Pub/Sub Lite. Messages can be retained for up to 7 days for replay, and subscriptions can be created after the fact to read from a snapshot. Pub/Sub scales automatically to handle any volume without capacity planning. It integrates natively with Dataflow (stream processing), BigQuery (direct subscription ingest), and Cloud Functions, making it the default messaging layer for GCP-native data pipelines. The pull and push subscription models give consumers flexibility in how they receive messages. Pub/Sub is the right choice for GCP-native architectures, real-time analytics pipelines feeding Dataflow or BigQuery, and event-driven microservices running on Cloud Run or GKE.
  code: MQ_PUBSUB

[REDIS-STREAMS]: Redis Streams
  color: #DC382D
  description: Redis Streams is an append-only log data structure built into Redis, providing many of Kafka's core capabilities — persistent event logs, consumer groups, message acknowledgement, and replayable offsets — with the operational simplicity of a Redis instance you may already be running. Streams support consumer groups (multiple consumers compete to process messages from a stream), pending entry lists (messages acknowledged but not yet confirmed), and XREAD with blocking for low-latency consumption. Message retention is configurable by count or memory size, making it suitable for short-to-medium term event history. Redis Streams is the right choice when you already operate Redis (for caching or session storage) and need event streaming capabilities without introducing Kafka's operational complexity. It is not the right choice for multi-terabyte event archives or scenarios requiring Kafka's partition-level scalability, but for moderate throughput workloads with a team already familiar with Redis, it is an elegant and low-cost solution.
  code: MQ_REDIS_STREAMS

Machine Access

Questions in this decision tree

Possible outcomes

How to use this decision tree

Click "Open interactive version" to step through the questions. Your answers narrow the tree until a recommended outcome is reached. You can also embed this tree on your own site.

More decision trees by Andrew

Which API design pattern is right for my project?
Which API design pattern is right for my project?
Determine the right API design style for your integration scenario.
Authentication Method Selection
Authentication Method Selection
Authentication is a security-critical, high-friction decision to reverse — migrating users from one auth method to another requires coordinated password resets or credential migration campaigns. This tree eliminates methods that don't match your user type, enterprise requirements, and security posture, giving you a clear shortlist before you write a line of code.
Caching Strategy Selection
Caching Strategy Selection
Premature or misapplied caching adds complexity — stale data bugs, invalidation logic, and distributed consistency problems — without solving the actual bottleneck. This tree routes you to the caching pattern that matches your data access profile, so you apply the right tool to the right problem rather than defaulting to Redis for everything.
CI/CD Pipeline Tool Selection
CI/CD Pipeline Tool Selection
Choosing a CI/CD platform is a long-term infrastructure commitment — pipelines accumulate config, custom scripts, and team muscle memory that make switching painful. This tree eliminates tools that don't fit your source control host, infrastructure model, or team scale, leaving only the options genuinely viable for your situation.
Which cloud provider should I use — AWS, Azure, or Google Cloud?
Which cloud provider should I use — AWS, Azure, or Google Cloud?
Answer a few questions to identify the most suitable cloud platform for your workload.
Container Orchestration Platform Selection
Container Orchestration Platform Selection
Container orchestration is foundational infrastructure — the platform you choose shapes how you deploy, scale, network, and operate every service you run. This tree eliminates options that don't match your operational maturity, cloud provider commitment, and workload complexity, so you land on the platform that fits your team today without over-engineering for a scale you haven't reached.