Message Queue and Event Bus Selection
Elimination matrix
backendmessagingevent drivenarchitectureinfrastructuredata
Messaging infrastructure is hard to replace once services are built around it — consumer groups, retention policies, and delivery guarantees become load-bearing assumptions embedded in application code. This tree eliminates brokers that don't match your durability, routing, throughput, and operational constraints, so you choose the right foundation before building on it.
Overview
Decision Tree
Start: Do you need to replay historical messages — re-read events from a past point in time for audit trails, disaster recovery, or onboarding new consumers to existing event streams?
A: YES — event replay and durable retention are required [KAFKA, REDIS-STREAMS]
B: NO — messages are consumed once and can be discarded after acknowledgement [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]
Machine-Readable JSON (Canonical Model)
View JSON
{
"_meta": {
"schema": "https://www.drawdecisiontree.com/decision-dag.schema.json",
"source": "https://www.drawdecisiontree.com",
"description": "DrawDecisionTree.com is a free tool for building, sharing, and embedding interactive decision trees. This file is the machine-readable export of a published decision tree. The `dsl` field contains the original source in the Decision DAG DSL; the `dag` schema is documented at the URL in `schema` above.",
"links": {
"interactive": "https://www.drawdecisiontree.com/t/drawdecisiontree/message-queue.html",
"embed": "https://www.drawdecisiontree.com/embed/path/drawdecisiontree/message-queue",
"dsl_reference": "https://www.drawdecisiontree.com/decision-tree-dsl-reference.html",
"guides": "https://www.drawdecisiontree.com/guides",
"schema_docs": "https://www.drawdecisiontree.com/decision-dag.schema.json",
"author_trees": "https://www.drawdecisiontree.com/trees/drawdecisiontree"
},
"generated_at": "2026-05-29T12:05:39.330Z"
},
"author": {
"handle": "drawdecisiontree",
"first_name": "Andrew",
"last_name": null,
"avatar_url": "1d32d828-b6ca-40ec-bdd7-771fe7b9c36a/avatar-1778531481027.svg",
"display_name": "Andrew"
},
"file": {
"id": "875e7e17-bb6d-4ec4-afca-e5a235c7dee6",
"name": "Message Queue and Event Bus Selection",
"public_slug": "message-queue",
"updated_at": "2026-05-12T16:53:43.587978+00:00",
"url": "https://www.drawdecisiontree.com/t/drawdecisiontree/message-queue.html",
"json_url": "https://www.drawdecisiontree.com/t/drawdecisiontree/message-queue/tree.json",
"dsl_url": "https://www.drawdecisiontree.com/t/drawdecisiontree/message-queue/tree.dag"
},
"meta": {
"description": "Messaging infrastructure is hard to replace once services are built around it — consumer groups, retention policies, and delivery guarantees become load-bearing assumptions embedded in application code. This tree eliminates brokers that don't match your durability, routing, throughput, and operational constraints, so you choose the right foundation before building on it.",
"mode": "elimination",
"entry": "Q1",
"tags": [
"backend",
"messaging",
"event driven",
"architecture",
"infrastructure",
"data"
],
"image": "https://images.unsplash.com/photo-1516321318423-f06f85e504b3?w=1200&q=80"
},
"questions": [
{
"id": "Q1",
"text": "Do you need to replay historical messages — re-read events from a past point in time for audit trails, disaster recovery, or onboarding new consumers to existing event streams?"
},
{
"id": "A",
"text": "YES — event replay and durable retention are required [KAFKA, REDIS-STREAMS]"
},
{
"id": "B",
"text": "NO — messages are consumed once and can be discarded after acknowledgement [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]"
},
{
"id": "Q2",
"text": "Are you exclusively on a single cloud provider and want a fully managed, zero-ops message broker with no infrastructure to run?"
},
{
"id": "A",
"text": "AWS — exclusively [SQS, KAFKA]"
},
{
"id": "B",
"text": "GCP — exclusively [PUBSUB, KAFKA]"
},
{
"id": "C",
"text": "Multi-cloud, Azure, on-premise, or no preference [KAFKA, RABBITMQ, REDIS-STREAMS]"
},
{
"id": "Q3",
"text": "Do you need complex message routing — topic exchanges, dead-letter queues, priority queues, per-message TTL, or consumer acknowledgement with selective requeue?"
},
{
"id": "A",
"text": "YES — flexible routing topologies, dead-letter handling, or priority queues [RABBITMQ, KAFKA]"
},
{
"id": "B",
"text": "NO — simple FIFO delivery, fan-out, or point-to-point is sufficient [SQS, PUBSUB, REDIS-STREAMS, KAFKA]"
},
{
"id": "Q4",
"text": "Do you need to sustain peak ingestion above 100,000 messages per second per topic?"
},
{
"id": "A",
"text": "YES — extreme throughput, event streaming at internet scale [KAFKA]"
},
{
"id": "B",
"text": "NO — moderate throughput is sufficient for our workload [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]"
}
],
"outcomes": [
{
"id": "KAFKA",
"label": "Apache Kafka"
},
{
"id": "RABBITMQ",
"label": "RabbitMQ"
},
{
"id": "SQS",
"label": "Amazon SQS"
},
{
"id": "PUBSUB",
"label": "Google Cloud Pub/Sub"
}
],
"dsl": "dag: Message Queue and Event Bus Selection\nversion: 1.0.0\nimage: https://images.unsplash.com/photo-1516321318423-f06f85e504b3?w=1200&q=80\ndescription: Messaging infrastructure is hard to replace once services are built around it — consumer groups, retention policies, and delivery guarantees become load-bearing assumptions embedded in application code. This tree eliminates brokers that don't match your durability, routing, throughput, and operational constraints, so you choose the right foundation before building on it.\ntags: backend, messaging, event driven, architecture, infrastructure, data\nentry: Q1\nmode: elimination\n\nQ1: Do you need to replay historical messages — re-read events from a past point in time for audit trails, disaster recovery, or onboarding new consumers to existing event streams?\n hint: Kafka and Redis Streams retain messages for a configurable duration (or indefinitely). Traditional queues like RabbitMQ and SQS delete messages once they are acknowledged — once consumed, they are gone. If any current or future consumer might need to reprocess history, choose a log-based broker from the start.\n A: YES — event replay and durable retention are required [KAFKA, REDIS-STREAMS]\n B: NO — messages are consumed once and can be discarded after acknowledgement [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]\n\nQ2: Are you exclusively on a single cloud provider and want a fully managed, zero-ops message broker with no infrastructure to run?\n hint: Managed cloud brokers (SQS, Pub/Sub) eliminate all infrastructure overhead — no brokers to patch, scale, or monitor. The trade-off is cloud lock-in and less control over advanced configuration. If your workload spans multiple clouds or runs on-premise, a self-hosted or cloud-agnostic option is safer.\n A: AWS — exclusively [SQS, KAFKA]\n B: GCP — exclusively [PUBSUB, KAFKA]\n C: Multi-cloud, Azure, on-premise, or no preference [KAFKA, RABBITMQ, REDIS-STREAMS]\n\nQ3: Do you need complex message routing — topic exchanges, dead-letter queues, priority queues, per-message TTL, or consumer acknowledgement with selective requeue?\n hint: RabbitMQ's exchange model (direct, topic, fanout, headers) provides fine-grained routing logic that lets producers send messages without knowing which consumers will receive them. Kafka provides partitioned topics but simpler routing — complex per-message routing patterns are not its strength.\n A: YES — flexible routing topologies, dead-letter handling, or priority queues [RABBITMQ, KAFKA]\n B: NO — simple FIFO delivery, fan-out, or point-to-point is sufficient [SQS, PUBSUB, REDIS-STREAMS, KAFKA]\n\nQ4: Do you need to sustain peak ingestion above 100,000 messages per second per topic?\n hint: At very high throughput, most brokers become bottlenecks before your application does. Kafka's partitioned, append-only log architecture is specifically designed for this scale — it can sustain millions of messages per second by adding partitions and brokers. SQS and Pub/Sub also scale well under managed operation but have per-request cost implications at extreme volume.\n A: YES — extreme throughput, event streaming at internet scale [KAFKA]\n B: NO — moderate throughput is sufficient for our workload [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]\n\n[KAFKA]: Apache Kafka\n color: #231F20\n description: Apache Kafka is the industry-standard distributed event streaming platform, designed for high-throughput, durable, ordered event logs. Its core abstraction is the partitioned, replicated log — messages are appended sequentially and retained for a configurable duration (hours, days, or indefinitely), allowing any consumer to read from any offset at any time. This makes Kafka the natural choice for event sourcing, audit trails, real-time analytics pipelines, and change data capture (CDC) from databases. Kafka's consumer group model allows multiple independent services to consume the same event stream in parallel without interfering with each other. The operational overhead of self-managed Kafka is real — ZooKeeper (now replaced by KRaft in Kafka 3.x), partition rebalancing, and retention management require expertise. Confluent Cloud and Amazon MSK provide managed Kafka that eliminates most of this overhead while preserving compatibility. Kafka is the right choice at scale, but its complexity is unnecessary for simple task queues or low-throughput integrations.\n code: MQ_KAFKA\n\n[RABBITMQ]: RabbitMQ\n color: #FF6600\n description: RabbitMQ is a battle-tested open-source message broker built around the AMQP protocol. Its exchange-and-queue model provides exceptional routing flexibility — direct exchanges for point-to-point delivery, topic exchanges for pattern-matched routing, fanout exchanges for broadcast, and headers exchanges for attribute-based routing. Dead-letter exchanges, per-message TTL, priority queues, and consumer acknowledgement with selective requeue are all first-class features. RabbitMQ is the right choice for task queues, job scheduling, work distribution across microservices, and any scenario where complex routing logic or delivery guarantees matter more than raw throughput. Its management UI provides real-time visibility into queue depths, consumer lag, and message rates. RabbitMQ is a push-based broker — messages are delivered to consumers as they arrive, unlike Kafka's pull-based model. Quorum queues (Raft-based) provide strong durability guarantees for critical workloads. Operationally, RabbitMQ is significantly simpler to run than Kafka for moderate workloads.\n code: MQ_RABBITMQ\n\n[SQS]: Amazon SQS\n color: #FF9900\n description: Amazon SQS is AWS's fully managed, serverless message queue service — no brokers to provision, patch, or scale. It provides two queue types: Standard queues offer at-least-once delivery with best-effort ordering and virtually unlimited throughput; FIFO queues guarantee exactly-once processing and strict message ordering at up to 3,000 transactions per second. SQS integrates natively with Lambda (event source mappings trigger functions on queue depth), SNS (fan-out to multiple queues), and Step Functions, making it the natural message queue for AWS-native architectures. Dead-letter queue support captures messages that fail processing after a configurable number of retries, enabling debugging without message loss. SQS does not support event replay — messages are deleted after consumption. It is the right choice for AWS-native task queues, decoupling services within an AWS account, and serverless event-driven architectures where operational simplicity is paramount.\n code: MQ_SQS\n\n[PUBSUB]: Google Cloud Pub/Sub\n color: #4285F4\n description: Google Cloud Pub/Sub is a fully managed, globally distributed messaging service designed for real-time event streaming and asynchronous service integration. It provides at-least-once delivery with support for exactly-once delivery in Pub/Sub Lite. Messages can be retained for up to 7 days for replay, and subscriptions can be created after the fact to read from a snapshot. Pub/Sub scales automatically to handle any volume without capacity planning. It integrates natively with Dataflow (stream processing), BigQuery (direct subscription ingest), and Cloud Functions, making it the default messaging layer for GCP-native data pipelines. The pull and push subscription models give consumers flexibility in how they receive messages. Pub/Sub is the right choice for GCP-native architectures, real-time analytics pipelines feeding Dataflow or BigQuery, and event-driven microservices running on Cloud Run or GKE.\n code: MQ_PUBSUB\n\n[REDIS-STREAMS]: Redis Streams\n color: #DC382D\n description: Redis Streams is an append-only log data structure built into Redis, providing many of Kafka's core capabilities — persistent event logs, consumer groups, message acknowledgement, and replayable offsets — with the operational simplicity of a Redis instance you may already be running. Streams support consumer groups (multiple consumers compete to process messages from a stream), pending entry lists (messages acknowledged but not yet confirmed), and XREAD with blocking for low-latency consumption. Message retention is configurable by count or memory size, making it suitable for short-to-medium term event history. Redis Streams is the right choice when you already operate Redis (for caching or session storage) and need event streaming capabilities without introducing Kafka's operational complexity. It is not the right choice for multi-terabyte event archives or scenarios requiring Kafka's partition-level scalability, but for moderate throughput workloads with a team already familiar with Redis, it is an elegant and low-cost solution.\n code: MQ_REDIS_STREAMS\n"
}DSL Representation
dag: Message Queue and Event Bus Selection
version: 1.0.0
image: https://images.unsplash.com/photo-1516321318423-f06f85e504b3?w=1200&q=80
description: Messaging infrastructure is hard to replace once services are built around it — consumer groups, retention policies, and delivery guarantees become load-bearing assumptions embedded in application code. This tree eliminates brokers that don't match your durability, routing, throughput, and operational constraints, so you choose the right foundation before building on it.
tags: backend, messaging, event driven, architecture, infrastructure, data
entry: Q1
mode: elimination
Q1: Do you need to replay historical messages — re-read events from a past point in time for audit trails, disaster recovery, or onboarding new consumers to existing event streams?
hint: Kafka and Redis Streams retain messages for a configurable duration (or indefinitely). Traditional queues like RabbitMQ and SQS delete messages once they are acknowledged — once consumed, they are gone. If any current or future consumer might need to reprocess history, choose a log-based broker from the start.
A: YES — event replay and durable retention are required [KAFKA, REDIS-STREAMS]
B: NO — messages are consumed once and can be discarded after acknowledgement [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]
Q2: Are you exclusively on a single cloud provider and want a fully managed, zero-ops message broker with no infrastructure to run?
hint: Managed cloud brokers (SQS, Pub/Sub) eliminate all infrastructure overhead — no brokers to patch, scale, or monitor. The trade-off is cloud lock-in and less control over advanced configuration. If your workload spans multiple clouds or runs on-premise, a self-hosted or cloud-agnostic option is safer.
A: AWS — exclusively [SQS, KAFKA]
B: GCP — exclusively [PUBSUB, KAFKA]
C: Multi-cloud, Azure, on-premise, or no preference [KAFKA, RABBITMQ, REDIS-STREAMS]
Q3: Do you need complex message routing — topic exchanges, dead-letter queues, priority queues, per-message TTL, or consumer acknowledgement with selective requeue?
hint: RabbitMQ's exchange model (direct, topic, fanout, headers) provides fine-grained routing logic that lets producers send messages without knowing which consumers will receive them. Kafka provides partitioned topics but simpler routing — complex per-message routing patterns are not its strength.
A: YES — flexible routing topologies, dead-letter handling, or priority queues [RABBITMQ, KAFKA]
B: NO — simple FIFO delivery, fan-out, or point-to-point is sufficient [SQS, PUBSUB, REDIS-STREAMS, KAFKA]
Q4: Do you need to sustain peak ingestion above 100,000 messages per second per topic?
hint: At very high throughput, most brokers become bottlenecks before your application does. Kafka's partitioned, append-only log architecture is specifically designed for this scale — it can sustain millions of messages per second by adding partitions and brokers. SQS and Pub/Sub also scale well under managed operation but have per-request cost implications at extreme volume.
A: YES — extreme throughput, event streaming at internet scale [KAFKA]
B: NO — moderate throughput is sufficient for our workload [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]
[KAFKA]: Apache Kafka
color: #231F20
description: Apache Kafka is the industry-standard distributed event streaming platform, designed for high-throughput, durable, ordered event logs. Its core abstraction is the partitioned, replicated log — messages are appended sequentially and retained for a configurable duration (hours, days, or indefinitely), allowing any consumer to read from any offset at any time. This makes Kafka the natural choice for event sourcing, audit trails, real-time analytics pipelines, and change data capture (CDC) from databases. Kafka's consumer group model allows multiple independent services to consume the same event stream in parallel without interfering with each other. The operational overhead of self-managed Kafka is real — ZooKeeper (now replaced by KRaft in Kafka 3.x), partition rebalancing, and retention management require expertise. Confluent Cloud and Amazon MSK provide managed Kafka that eliminates most of this overhead while preserving compatibility. Kafka is the right choice at scale, but its complexity is unnecessary for simple task queues or low-throughput integrations.
code: MQ_KAFKA
[RABBITMQ]: RabbitMQ
color: #FF6600
description: RabbitMQ is a battle-tested open-source message broker built around the AMQP protocol. Its exchange-and-queue model provides exceptional routing flexibility — direct exchanges for point-to-point delivery, topic exchanges for pattern-matched routing, fanout exchanges for broadcast, and headers exchanges for attribute-based routing. Dead-letter exchanges, per-message TTL, priority queues, and consumer acknowledgement with selective requeue are all first-class features. RabbitMQ is the right choice for task queues, job scheduling, work distribution across microservices, and any scenario where complex routing logic or delivery guarantees matter more than raw throughput. Its management UI provides real-time visibility into queue depths, consumer lag, and message rates. RabbitMQ is a push-based broker — messages are delivered to consumers as they arrive, unlike Kafka's pull-based model. Quorum queues (Raft-based) provide strong durability guarantees for critical workloads. Operationally, RabbitMQ is significantly simpler to run than Kafka for moderate workloads.
code: MQ_RABBITMQ
[SQS]: Amazon SQS
color: #FF9900
description: Amazon SQS is AWS's fully managed, serverless message queue service — no brokers to provision, patch, or scale. It provides two queue types: Standard queues offer at-least-once delivery with best-effort ordering and virtually unlimited throughput; FIFO queues guarantee exactly-once processing and strict message ordering at up to 3,000 transactions per second. SQS integrates natively with Lambda (event source mappings trigger functions on queue depth), SNS (fan-out to multiple queues), and Step Functions, making it the natural message queue for AWS-native architectures. Dead-letter queue support captures messages that fail processing after a configurable number of retries, enabling debugging without message loss. SQS does not support event replay — messages are deleted after consumption. It is the right choice for AWS-native task queues, decoupling services within an AWS account, and serverless event-driven architectures where operational simplicity is paramount.
code: MQ_SQS
[PUBSUB]: Google Cloud Pub/Sub
color: #4285F4
description: Google Cloud Pub/Sub is a fully managed, globally distributed messaging service designed for real-time event streaming and asynchronous service integration. It provides at-least-once delivery with support for exactly-once delivery in Pub/Sub Lite. Messages can be retained for up to 7 days for replay, and subscriptions can be created after the fact to read from a snapshot. Pub/Sub scales automatically to handle any volume without capacity planning. It integrates natively with Dataflow (stream processing), BigQuery (direct subscription ingest), and Cloud Functions, making it the default messaging layer for GCP-native data pipelines. The pull and push subscription models give consumers flexibility in how they receive messages. Pub/Sub is the right choice for GCP-native architectures, real-time analytics pipelines feeding Dataflow or BigQuery, and event-driven microservices running on Cloud Run or GKE.
code: MQ_PUBSUB
[REDIS-STREAMS]: Redis Streams
color: #DC382D
description: Redis Streams is an append-only log data structure built into Redis, providing many of Kafka's core capabilities — persistent event logs, consumer groups, message acknowledgement, and replayable offsets — with the operational simplicity of a Redis instance you may already be running. Streams support consumer groups (multiple consumers compete to process messages from a stream), pending entry lists (messages acknowledged but not yet confirmed), and XREAD with blocking for low-latency consumption. Message retention is configurable by count or memory size, making it suitable for short-to-medium term event history. Redis Streams is the right choice when you already operate Redis (for caching or session storage) and need event streaming capabilities without introducing Kafka's operational complexity. It is not the right choice for multi-terabyte event archives or scenarios requiring Kafka's partition-level scalability, but for moderate throughput workloads with a team already familiar with Redis, it is an elegant and low-cost solution.
code: MQ_REDIS_STREAMS
Machine Access
- Static JSON:
/t/drawdecisiontree/message-queue/tree.json - Live JSON (SPA):
/json/drawdecisiontree/message-queue - Raw DSL:
/t/drawdecisiontree/message-queue/tree.dag - Canonical HTML:
/t/drawdecisiontree/message-queue.html
Questions in this decision tree
- Do you need to replay historical messages — re-read events from a past point in time for audit trails, disaster recovery, or onboarding new consumers to existing event streams?
- YES — event replay and durable retention are required [KAFKA, REDIS-STREAMS]
- NO — messages are consumed once and can be discarded after acknowledgement [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]
- Are you exclusively on a single cloud provider and want a fully managed, zero-ops message broker with no infrastructure to run?
- AWS — exclusively [SQS, KAFKA]
- GCP — exclusively [PUBSUB, KAFKA]
- Multi-cloud, Azure, on-premise, or no preference [KAFKA, RABBITMQ, REDIS-STREAMS]
- Do you need complex message routing — topic exchanges, dead-letter queues, priority queues, per-message TTL, or consumer acknowledgement with selective requeue?
- YES — flexible routing topologies, dead-letter handling, or priority queues [RABBITMQ, KAFKA]
- NO — simple FIFO delivery, fan-out, or point-to-point is sufficient [SQS, PUBSUB, REDIS-STREAMS, KAFKA]
- Do you need to sustain peak ingestion above 100,000 messages per second per topic?
- YES — extreme throughput, event streaming at internet scale [KAFKA]
- NO — moderate throughput is sufficient for our workload [RABBITMQ, SQS, PUBSUB, REDIS-STREAMS, KAFKA]
Possible outcomes
- Apache Kafka
- RabbitMQ
- Amazon SQS
- Google Cloud Pub/Sub
How to use this decision tree
Click "Open interactive version" to step through the questions. Your answers narrow the tree until a recommended outcome is reached. You can also embed this tree on your own site.
More decision trees by Andrew
Which API design pattern is right for my project?
Determine the right API design style for your integration scenario.
Authentication Method Selection
Authentication is a security-critical, high-friction decision to reverse — migrating users from one auth method to another requires coordinated password resets or credential migration campaigns. This tree eliminates methods that don't match your user type, enterprise requirements, and security posture, giving you a clear shortlist before you write a line of code.
Caching Strategy Selection
Premature or misapplied caching adds complexity — stale data bugs, invalidation logic, and distributed consistency problems — without solving the actual bottleneck. This tree routes you to the caching pattern that matches your data access profile, so you apply the right tool to the right problem rather than defaulting to Redis for everything.
CI/CD Pipeline Tool Selection
Choosing a CI/CD platform is a long-term infrastructure commitment — pipelines accumulate config, custom scripts, and team muscle memory that make switching painful. This tree eliminates tools that don't fit your source control host, infrastructure model, or team scale, leaving only the options genuinely viable for your situation.
Which cloud provider should I use — AWS, Azure, or Google Cloud?
Answer a few questions to identify the most suitable cloud platform for your workload.
Container Orchestration Platform Selection
Container orchestration is foundational infrastructure — the platform you choose shapes how you deploy, scale, network, and operate every service you run. This tree eliminates options that don't match your operational maturity, cloud provider commitment, and workload complexity, so you land on the platform that fits your team today without over-engineering for a scale you haven't reached.