Message Queue and Event Bus Selection

By Andrew

Elimination matrix backendmessagingevent drivenarchitectureinfrastructuredata

Messaging infrastructure is hard to replace once services are built around it — consumer groups, retention policies, and delivery guarantees become load-bearing assumptions embedded in application code. This tree eliminates brokers that don't match your durability, routing, throughput, and operational constraints, so you choose the right foundation before building on it.

Open interactive version →

Overview

Type: Elimination matrix
Tags: backend, messaging, event driven, architecture, infrastructure, data
Entry: Q1
Questions: 13
Outcomes: 4
Author: Andrew
Last updated: 2026-05-12

Decision Tree

Start: Do you need to replay historical messages — re-read events from a past point in time for audit trails, disaster recovery, or onboarding new consumers to existing event streams?

A: YES — event replay and durable retention are required [KAFKA, REDIS-STREAMS]

B: NO — messages are consumed once and can be discarded after acknowledgement [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]

Machine-Readable JSON (Canonical Model)

View JSON

{
  "_meta": {
    "schema": "https://www.drawdecisiontree.com/decision-dag.schema.json",
    "source": "https://www.drawdecisiontree.com",
    "description": "DrawDecisionTree.com is a free tool for building, sharing, and embedding interactive decision trees. This file is the machine-readable export of a published decision tree. The `dsl` field contains the original source in the Decision DAG DSL; the `dag` schema is documented at the URL in `schema` above.",
    "links": {
      "interactive": "https://www.drawdecisiontree.com/t/drawdecisiontree/message-queue.html",
      "embed": "https://www.drawdecisiontree.com/embed/path/drawdecisiontree/message-queue",
      "dsl_reference": "https://www.drawdecisiontree.com/decision-tree-dsl-reference.html",
      "guides": "https://www.drawdecisiontree.com/guides",
      "schema_docs": "https://www.drawdecisiontree.com/decision-dag.schema.json",
      "author_trees": "https://www.drawdecisiontree.com/trees/drawdecisiontree"
    },
    "generated_at": "2026-05-29T12:05:39.330Z"
  },
  "author": {
    "handle": "drawdecisiontree",
    "first_name": "Andrew",
    "last_name": null,
    "avatar_url": "1d32d828-b6ca-40ec-bdd7-771fe7b9c36a/avatar-1778531481027.svg",
    "display_name": "Andrew"
  },
  "file": {
    "id": "875e7e17-bb6d-4ec4-afca-e5a235c7dee6",
    "name": "Message Queue and Event Bus Selection",
    "public_slug": "message-queue",
    "updated_at": "2026-05-12T16:53:43.587978+00:00",
    "url": "https://www.drawdecisiontree.com/t/drawdecisiontree/message-queue.html",
    "json_url": "https://www.drawdecisiontree.com/t/drawdecisiontree/message-queue/tree.json",
    "dsl_url": "https://www.drawdecisiontree.com/t/drawdecisiontree/message-queue/tree.dag"
  },
  "meta": {
    "description": "Messaging infrastructure is hard to replace once services are built around it — consumer groups, retention policies, and delivery guarantees become load-bearing assumptions embedded in application code. This tree eliminates brokers that don't match your durability, routing, throughput, and operational constraints, so you choose the right foundation before building on it.",
    "mode": "elimination",
    "entry": "Q1",
    "tags": [
      "backend",
      "messaging",
      "event driven",
      "architecture",
      "infrastructure",
      "data"
    ],
    "image": "https://images.unsplash.com/photo-1516321318423-f06f85e504b3?w=1200&q=80"
  },
  "questions": [
    {
      "id": "Q1",
      "text": "Do you need to replay historical messages — re-read events from a past point in time for audit trails, disaster recovery, or onboarding new consumers to existing event streams?"
    },
    {
      "id": "A",
      "text": "YES — event replay and durable retention are required [KAFKA, REDIS-STREAMS]"
    },
    {
      "id": "B",
      "text": "NO — messages are consumed once and can be discarded after acknowledgement [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]"
    },
    {
      "id": "Q2",
      "text": "Are you exclusively on a single cloud provider and want a fully managed, zero-ops message broker with no infrastructure to run?"
    },
    {
      "id": "A",
      "text": "AWS — exclusively [SQS, KAFKA]"
    },
    {
      "id": "B",
      "text": "GCP — exclusively [PUBSUB, KAFKA]"
    },
    {
      "id": "C",
      "text": "Multi-cloud, Azure, on-premise, or no preference [KAFKA, RABBITMQ, REDIS-STREAMS]"
    },
    {
      "id": "Q3",
      "text": "Do you need complex message routing — topic exchanges, dead-letter queues, priority queues, per-message TTL, or consumer acknowledgement with selective requeue?"
    },
    {
      "id": "A",
      "text": "YES — flexible routing topologies, dead-letter handling, or priority queues [RABBITMQ, KAFKA]"
    },
    {
      "id": "B",
      "text": "NO — simple FIFO delivery, fan-out, or point-to-point is sufficient [SQS, PUBSUB, REDIS-STREAMS, KAFKA]"
    },
    {
      "id": "Q4",
      "text": "Do you need to sustain peak ingestion above 100,000 messages per second per topic?"
    },
    {
      "id": "A",
      "text": "YES — extreme throughput, event streaming at internet scale [KAFKA]"
    },
    {
      "id": "B",
      "text": "NO — moderate throughput is sufficient for our workload [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]"
    }
  ],
  "outcomes": [
    {
      "id": "KAFKA",
      "label": "Apache Kafka"
    },
    {
      "id": "RABBITMQ",
      "label": "RabbitMQ"
    },
    {
      "id": "SQS",
      "label": "Amazon SQS"
    },
    {
      "id": "PUBSUB",
      "label": "Google Cloud Pub/Sub"
    }
  ],
  "dsl": "dag: Message Queue and Event Bus Selection\nversion: 1.0.0\nimage: https://images.unsplash.com/photo-1516321318423-f06f85e504b3?w=1200&q=80\ndescription: Messaging infrastructure is hard to replace once services are built around it — consumer groups, retention policies, and delivery guarantees become load-bearing assumptions embedded in application code. This tree eliminates brokers that don't match your durability, routing, throughput, and operational constraints, so you choose the right foundation before building on it.\ntags: backend, messaging, event driven, architecture, infrastructure, data\nentry: Q1\nmode: elimination\n\nQ1: Do you need to replay historical messages — re-read events from a past point in time for audit trails, disaster recovery, or onboarding new consumers to existing event streams?\n  hint: Kafka and Redis Streams retain messages for a configurable duration (or indefinitely). Traditional queues like RabbitMQ and SQS delete messages once they are acknowledged — once consumed, they are gone. If any current or future consumer might need to reprocess history, choose a log-based broker from the start.\n  A: YES — event replay and durable retention are required [KAFKA, REDIS-STREAMS]\n  B: NO — messages are consumed once and can be discarded after acknowledgement [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]\n\nQ2: Are you exclusively on a single cloud provider and want a fully managed, zero-ops message broker with no infrastructure to run?\n  hint: Managed cloud brokers (SQS, Pub/Sub) eliminate all infrastructure overhead — no brokers to patch, scale, or monitor. The trade-off is cloud lock-in and less control over advanced configuration. If your workload spans multiple clouds or runs on-premise, a self-hosted or cloud-agnostic option is safer.\n  A: AWS — exclusively [SQS, KAFKA]\n  B: GCP — exclusively [PUBSUB, KAFKA]\n  C: Multi-cloud, Azure, on-premise, or no preference [KAFKA, RABBITMQ, REDIS-STREAMS]\n\nQ3: Do you need complex message routing — topic exchanges, dead-letter queues, priority queues, per-message TTL, or consumer acknowledgement with selective requeue?\n  hint: RabbitMQ's exchange model (direct, topic, fanout, headers) provides fine-grained routing logic that lets producers send messages without knowing which consumers will receive them. Kafka provides partitioned topics but simpler routing — complex per-message routing patterns are not its strength.\n  A: YES — flexible routing topologies, dead-letter handling, or priority queues [RABBITMQ, KAFKA]\n  B: NO — simple FIFO delivery, fan-out, or point-to-point is sufficient [SQS, PUBSUB, REDIS-STREAMS, KAFKA]\n\nQ4: Do you need to sustain peak ingestion above 100,000 messages per second per topic?\n  hint: At very high throughput, most brokers become bottlenecks before your application does. Kafka's partitioned, append-only log architecture is specifically designed for this scale — it can sustain millions of messages per second by adding partitions and brokers. SQS and Pub/Sub also scale well under managed operation but have per-request cost implications at extreme volume.\n  A: YES — extreme throughput, event streaming at internet scale [KAFKA]\n  B: NO — moderate throughput is sufficient for our workload [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]\n\n[KAFKA]: Apache Kafka\n  color: #231F20\n  description: Apache Kafka is the industry-standard distributed event streaming platform, designed for high-throughput, durable, ordered event logs. Its core abstraction is the partitioned, replicated log — messages are appended sequentially and retained for a configurable duration (hours, days, or indefinitely), allowing any consumer to read from any offset at any time. This makes Kafka the natural choice for event sourcing, audit trails, real-time analytics pipelines, and change data capture (CDC) from databases. Kafka's consumer group model allows multiple independent services to consume the same event stream in parallel without interfering with each other. The operational overhead of self-managed Kafka is real — ZooKeeper (now replaced by KRaft in Kafka 3.x), partition rebalancing, and retention management require expertise. Confluent Cloud and Amazon MSK provide managed Kafka that eliminates most of this overhead while preserving compatibility. Kafka is the right choice at scale, but its complexity is unnecessary for simple task queues or low-throughput integrations.\n  code: MQ_KAFKA\n\n[RABBITMQ]: RabbitMQ\n  color: #FF6600\n  description: RabbitMQ is a battle-tested open-source message broker built around the AMQP protocol. Its exchange-and-queue model provides exceptional routing flexibility — direct exchanges for point-to-point delivery, topic exchanges for pattern-matched routing, fanout exchanges for broadcast, and headers exchanges for attribute-based routing. Dead-letter exchanges, per-message TTL, priority queues, and consumer acknowledgement with selective requeue are all first-class features. RabbitMQ is the right choice for task queues, job scheduling, work distribution across microservices, and any scenario where complex routing logic or delivery guarantees matter more than raw throughput. Its management UI provides real-time visibility into queue depths, consumer lag, and message rates. RabbitMQ is a push-based broker — messages are delivered to consumers as they arrive, unlike Kafka's pull-based model. Quorum queues (Raft-based) provide strong durability guarantees for critical workloads. Operationally, RabbitMQ is significantly simpler to run than Kafka for moderate workloads.\n  code: MQ_RABBITMQ\n\n[SQS]: Amazon SQS\n  color: #FF9900\n  description: Amazon SQS is AWS's fully managed, serverless message queue service — no brokers to provision, patch, or scale. It provides two queue types: Standard queues offer at-least-once delivery with best-effort ordering and virtually unlimited throughput; FIFO queues guarantee exactly-once processing and strict message ordering at up to 3,000 transactions per second. SQS integrates natively with Lambda (event source mappings trigger functions on queue depth), SNS (fan-out to multiple queues), and Step Functions, making it the natural message queue for AWS-native architectures. Dead-letter queue support captures messages that fail processing after a configurable number of retries, enabling debugging without message loss. SQS does not support event replay — messages are deleted after consumption. It is the right choice for AWS-native task queues, decoupling services within an AWS account, and serverless event-driven architectures where operational simplicity is paramount.\n  code: MQ_SQS\n\n[PUBSUB]: Google Cloud Pub/Sub\n  color: #4285F4\n  description: Google Cloud Pub/Sub is a fully managed, globally distributed messaging service designed for real-time event streaming and asynchronous service integration. It provides at-least-once delivery with support for exactly-once delivery in Pub/Sub Lite. Messages can be retained for up to 7 days for replay, and subscriptions can be created after the fact to read from a snapshot. Pub/Sub scales automatically to handle any volume without capacity planning. It integrates natively with Dataflow (stream processing), BigQuery (direct subscription ingest), and Cloud Functions, making it the default messaging layer for GCP-native data pipelines. The pull and push subscription models give consumers flexibility in how they receive messages. Pub/Sub is the right choice for GCP-native architectures, real-time analytics pipelines feeding Dataflow or BigQuery, and event-driven microservices running on Cloud Run or GKE.\n  code: MQ_PUBSUB\n\n[REDIS-STREAMS]: Redis Streams\n  color: #DC382D\n  description: Redis Streams is an append-only log data structure built into Redis, providing many of Kafka's core capabilities — persistent event logs, consumer groups, message acknowledgement, and replayable offsets — with the operational simplicity of a Redis instance you may already be running. Streams support consumer groups (multiple consumers compete to process messages from a stream), pending entry lists (messages acknowledged but not yet confirmed), and XREAD with blocking for low-latency consumption. Message retention is configurable by count or memory size, making it suitable for short-to-medium term event history. Redis Streams is the right choice when you already operate Redis (for caching or session storage) and need event streaming capabilities without introducing Kafka's operational complexity. It is not the right choice for multi-terabyte event archives or scenarios requiring Kafka's partition-level scalability, but for moderate throughput workloads with a team already familiar with Redis, it is an elegant and low-cost solution.\n  code: MQ_REDIS_STREAMS\n"
}

DSL Representation

dag: Message Queue and Event Bus Selection
version: 1.0.0
image: https://images.unsplash.com/photo-1516321318423-f06f85e504b3?w=1200&q=80
description: Messaging infrastructure is hard to replace once services are built around it — consumer groups, retention policies, and delivery guarantees become load-bearing assumptions embedded in application code. This tree eliminates brokers that don't match your durability, routing, throughput, and operational constraints, so you choose the right foundation before building on it.
tags: backend, messaging, event driven, architecture, infrastructure, data
entry: Q1
mode: elimination

Q1: Do you need to replay historical messages — re-read events from a past point in time for audit trails, disaster recovery, or onboarding new consumers to existing event streams?
  hint: Kafka and Redis Streams retain messages for a configurable duration (or indefinitely). Traditional queues like RabbitMQ and SQS delete messages once they are acknowledged — once consumed, they are gone. If any current or future consumer might need to reprocess history, choose a log-based broker from the start.
  A: YES — event replay and durable retention are required [KAFKA, REDIS-STREAMS]
  B: NO — messages are consumed once and can be discarded after acknowledgement [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]

Q2: Are you exclusively on a single cloud provider and want a fully managed, zero-ops message broker with no infrastructure to run?
  hint: Managed cloud brokers (SQS, Pub/Sub) eliminate all infrastructure overhead — no brokers to patch, scale, or monitor. The trade-off is cloud lock-in and less control over advanced configuration. If your workload spans multiple clouds or runs on-premise, a self-hosted or cloud-agnostic option is safer.
  A: AWS — exclusively [SQS, KAFKA]
  B: GCP — exclusively [PUBSUB, KAFKA]
  C: Multi-cloud, Azure, on-premise, or no preference [KAFKA, RABBITMQ, REDIS-STREAMS]

Q3: Do you need complex message routing — topic exchanges, dead-letter queues, priority queues, per-message TTL, or consumer acknowledgement with selective requeue?
  hint: RabbitMQ's exchange model (direct, topic, fanout, headers) provides fine-grained routing logic that lets producers send messages without knowing which consumers will receive them. Kafka provides partitioned topics but simpler routing — complex per-message routing patterns are not its strength.
  A: YES — flexible routing topologies, dead-letter handling, or priority queues [RABBITMQ, KAFKA]
  B: NO — simple FIFO delivery, fan-out, or point-to-point is sufficient [SQS, PUBSUB, REDIS-STREAMS, KAFKA]

Q4: Do you need to sustain peak ingestion above 100,000 messages per second per topic?
  hint: At very high throughput, most brokers become bottlenecks before your application does. Kafka's partitioned, append-only log architecture is specifically designed for this scale — it can sustain millions of messages per second by adding partitions and brokers. SQS and Pub/Sub also scale well under managed operation but have per-request cost implications at extreme volume.
  A: YES — extreme throughput, event streaming at internet scale [KAFKA]
  B: NO — moderate throughput is sufficient for our workload [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]

[KAFKA]: Apache Kafka
  color: #231F20
  description: Apache Kafka is the industry-standard distributed event streaming platform, designed for high-throughput, durable, ordered event logs. Its core abstraction is the partitioned, replicated log — messages are appended sequentially and retained for a configurable duration (hours, days, or indefinitely), allowing any consumer to read from any offset at any time. This makes Kafka the natural choice for event sourcing, audit trails, real-time analytics pipelines, and change data capture (CDC) from databases. Kafka's consumer group model allows multiple independent services to consume the same event stream in parallel without interfering with each other. The operational overhead of self-managed Kafka is real — ZooKeeper (now replaced by KRaft in Kafka 3.x), partition rebalancing, and retention management require expertise. Confluent Cloud and Amazon MSK provide managed Kafka that eliminates most of this overhead while preserving compatibility. Kafka is the right choice at scale, but its complexity is unnecessary for simple task queues or low-throughput integrations.
  code: MQ_KAFKA

[RABBITMQ]: RabbitMQ
  color: #FF6600
  description: RabbitMQ is a battle-tested open-source message broker built around the AMQP protocol. Its exchange-and-queue model provides exceptional routing flexibility — direct exchanges for point-to-point delivery, topic exchanges for pattern-matched routing, fanout exchanges for broadcast, and headers exchanges for attribute-based routing. Dead-letter exchanges, per-message TTL, priority queues, and consumer acknowledgement with selective requeue are all first-class features. RabbitMQ is the right choice for task queues, job scheduling, work distribution across microservices, and any scenario where complex routing logic or delivery guarantees matter more than raw throughput. Its management UI provides real-time visibility into queue depths, consumer lag, and message rates. RabbitMQ is a push-based broker — messages are delivered to consumers as they arrive, unlike Kafka's pull-based model. Quorum queues (Raft-based) provide strong durability guarantees for critical workloads. Operationally, RabbitMQ is significantly simpler to run than Kafka for moderate workloads.
  code: MQ_RABBITMQ

[SQS]: Amazon SQS
  color: #FF9900
  description: Amazon SQS is AWS's fully managed, serverless message queue service — no brokers to provision, patch, or scale. It provides two queue types: Standard queues offer at-least-once delivery with best-effort ordering and virtually unlimited throughput; FIFO queues guarantee exactly-once processing and strict message ordering at up to 3,000 transactions per second. SQS integrates natively with Lambda (event source mappings trigger functions on queue depth), SNS (fan-out to multiple queues), and Step Functions, making it the natural message queue for AWS-native architectures. Dead-letter queue support captures messages that fail processing after a configurable number of retries, enabling debugging without message loss. SQS does not support event replay — messages are deleted after consumption. It is the right choice for AWS-native task queues, decoupling services within an AWS account, and serverless event-driven architectures where operational simplicity is paramount.
  code: MQ_SQS

[PUBSUB]: Google Cloud Pub/Sub
  color: #4285F4
  description: Google Cloud Pub/Sub is a fully managed, globally distributed messaging service designed for real-time event streaming and asynchronous service integration. It provides at-least-once delivery with support for exactly-once delivery in Pub/Sub Lite. Messages can be retained for up to 7 days for replay, and subscriptions can be created after the fact to read from a snapshot. Pub/Sub scales automatically to handle any volume without capacity planning. It integrates natively with Dataflow (stream processing), BigQuery (direct subscription ingest), and Cloud Functions, making it the default messaging layer for GCP-native data pipelines. The pull and push subscription models give consumers flexibility in how they receive messages. Pub/Sub is the right choice for GCP-native architectures, real-time analytics pipelines feeding Dataflow or BigQuery, and event-driven microservices running on Cloud Run or GKE.
  code: MQ_PUBSUB

[REDIS-STREAMS]: Redis Streams
  color: #DC382D
  description: Redis Streams is an append-only log data structure built into Redis, providing many of Kafka's core capabilities — persistent event logs, consumer groups, message acknowledgement, and replayable offsets — with the operational simplicity of a Redis instance you may already be running. Streams support consumer groups (multiple consumers compete to process messages from a stream), pending entry lists (messages acknowledged but not yet confirmed), and XREAD with blocking for low-latency consumption. Message retention is configurable by count or memory size, making it suitable for short-to-medium term event history. Redis Streams is the right choice when you already operate Redis (for caching or session storage) and need event streaming capabilities without introducing Kafka's operational complexity. It is not the right choice for multi-terabyte event archives or scenarios requiring Kafka's partition-level scalability, but for moderate throughput workloads with a team already familiar with Redis, it is an elegant and low-cost solution.
  code: MQ_REDIS_STREAMS

Machine Access

Static JSON: /t/drawdecisiontree/message-queue/tree.json
Live JSON (SPA): /json/drawdecisiontree/message-queue
Raw DSL: /t/drawdecisiontree/message-queue/tree.dag
Canonical HTML: /t/drawdecisiontree/message-queue.html

Questions in this decision tree

Do you need to replay historical messages — re-read events from a past point in time for audit trails, disaster recovery, or onboarding new consumers to existing event streams?
YES — event replay and durable retention are required [KAFKA, REDIS-STREAMS]
NO — messages are consumed once and can be discarded after acknowledgement [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]
Are you exclusively on a single cloud provider and want a fully managed, zero-ops message broker with no infrastructure to run?
AWS — exclusively [SQS, KAFKA]
GCP — exclusively [PUBSUB, KAFKA]
Multi-cloud, Azure, on-premise, or no preference [KAFKA, RABBITMQ, REDIS-STREAMS]
Do you need complex message routing — topic exchanges, dead-letter queues, priority queues, per-message TTL, or consumer acknowledgement with selective requeue?
YES — flexible routing topologies, dead-letter handling, or priority queues [RABBITMQ, KAFKA]
NO — simple FIFO delivery, fan-out, or point-to-point is sufficient [SQS, PUBSUB, REDIS-STREAMS, KAFKA]
Do you need to sustain peak ingestion above 100,000 messages per second per topic?
YES — extreme throughput, event streaming at internet scale [KAFKA]
NO — moderate throughput is sufficient for our workload [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]

Possible outcomes

Apache Kafka
RabbitMQ
Amazon SQS
Google Cloud Pub/Sub

How to use this decision tree

Click "Open interactive version" to step through the questions. Your answers narrow the tree until a recommended outcome is reached. You can also embed this tree on your own site.