Messaging platforms can handle asynchronous message management and process large amounts of data with high throughput, greatly improving data delivery reliability. However, with two different types of platformsâmessage queuing (point-to-point) or streaming (pub-sub) depending on the receiver and how the message is handledâand so many different messaging platforms popping up, choosing one can be a challenge.
In this roundup, youâll learn the difference between the two types and compare four leading options: NATS, RabbitMQ, NSQ, and Kafka.
Messaging Platforms: What Are They?
Messaging platforms provide a way for microservices to communicate and exchange data reliably and efficiently. This reliable communication between services architecture is crucial for completing tasks in microservice, as each service is responsible for managing one specific function.
First, letâs look at the two main types of messaging platforms.
Message Queuing
Message queuing, also known as point-to-point communication, involves sending messages to a queue, where they are stored until the intended recipient processes them. The sender and receiver do not need to be connected at the same time, facilitating asynchronous communication. The messages are stored in the queue until they can be delivered, and they are generally deleted after that.
Generally, queuing is appropriate for use cases where thereâs no requirement for real-time delivery or processing, such as batch processing, order fulfillment, or offline data processing.
Streaming
Streaming, or the publish-subscribe (pub-sub) model, has the capability to route messages to multiple queues that can be consumed by multiple consumers, who perform various operations based on the content of the messages. Message creators are called publishers or producers, message senders are called receivers or consumers, and messages are stored in files or topics.
Consumers can subscribe to a log file or topic to receive a continuous stream of messages stored within. These messages can be accessed in real time or from any point in history, unlike queues, where messages are removed. Streaming is commonly used in real-time applications such as IoT and stock market order processing.
How Are These Messaging Platforms Compared?
This article compares four major messaging platforms while focusing on the following five characteristics:
- Message queuing model: How are messages passed between two parties? Is it via a stream or a queue?
- Delivery guarantee: Are messages always delivered at least once, or is this not always the case?
- Ordering guarantee: Are messages delivered in the order they were sent, or are they not?
- Throughput and latency: How many messages can the platform handle, and how fast is the communication? Keep in mind that all these systems can scale to handle increased throughput, and that results will vary based on your system configuration.
- Persistence and replayability: Does the platform store messages and allow for reprocessing if they were missed the first time?
These factors will determine how the platform can be used in your workflow and whether itâs suitable for your use case.
NATS
NATS, or Neural Autonomic Transport System, is a cloud-native messaging platform that was first released in 2011. It was originally written in Ruby but has since been rewritten in Go to support increased throughput. With a minimal footprint of under 10 MB, itâs a lightweight solution with a very straightforward setup process. Itâs suitable for edge deployments and can be deployed anywhere, including using VMs and containers on a device.
Message Queuing Model
NATS follows a pub-sub model with extended support for models like request-reply and wildcard subscriptions.
Delivery Assurance
The messaging platform offers at-most-once delivery, which makes it suitable for real-time, high-speed communication between microservices or other applications but not for long-term event storage. The at-most-once delivery guarantee means that messages are delivered once, and if there is a failure in the delivery process, the message is lost.
This model is suitable for use cases where the loss of a few messages is acceptable as long as the overall system continues to operate correctly. It works particularly well in use cases that prioritize real-time communication over long-term data storage and persistence. For example, in a game like World of Warcraft, which has over 2 million players daily, itâs important that players know where they are in the world, but not necessarily where they were a few seconds ago. NATS is a good messaging tool for this kind of data volume that needs to be handled every second, so itâs unsurprising that the gameâs developer, Activision Blizzard, uses it.
Ordering Assurance
NATS supports ordering assurance. Messages published by a single connection are received by the server and then delivered in the same order to all active subscriptions. If you have concurrent publishers, the order relative to those publishers is not known or guaranteed, but end-to-end through a single connection is guaranteed. A subsystem called JetStream enables delivery guarantee, specifically âat least once.â
JetStream stores messages received by publishers in an immutable order. As a result, these messages will be consumed and replayed in their original order.
Throughput and Latency
NATS is known for its high performance, low latency, and emphasis on simplicity after it was rewritten in Go. Its rewrite in Go makes NATS an ideal choice for demanding and real-time applications and has increased its throughput compared to its original Ruby implementation.
NATS has been shown in many benchmarks to be faster and more efficient than other messaging systems, with the ability to manage 3 million messages per second.
Persistence and Replayability
NATSâs persistence engine, JetStream, provides persistence and replayability through two primitives called streams and consumers. Messages published to subjects bound to a stream are persisted and support a variety of retention models. One or more consumers can be created to manage the delivery of messages to clients when they are connected. Consumers support a few different acknowledgment and redelivery policies to accommodate a wide variety of use cases. This provides an âat least onceâ delivery guarantee and enables the ability to implement an âexactly onceâ processing model. JetStream also provides an additional âat least onceâ delivery model and a Kafka-like processing model.
Limitations
Even though NATS supports batch publishing, it doesnât support atomic batch publishing. This means even messages published in the same batch are not guaranteed to be treated atomically and succeed in an âall-in-oneâ fashion. Core NATS, with no persistence, can achieve high throughput in batch publishing, but with the persistence layer (a replicated file-backed stream) it will cap out at around 200k msgs/sec.
For use cases that require massive throughput/ingestion of, for example, clickstream data or events, a Kafka cluster can currently still scale better than a NATS cluster. But this is what Kafka was originally designed for, whereas NATS has evolved to address a richer set of use cases beyond its original use for streaming. Considering NATS wasnât created with a huge throughput capacity in mind, its ability to handle substantial loads is still decent and continues to improve.
RabbitMQ
RabbitMQ is a popular open-source messaging broker written in Erlang that has been available since 2007. It implements the AMQP 0-9-1 model, a technology-agnostic protocol that supports interoperability between different systems.
It operates on a push-based message delivery system, where the broker sends messages to consumers rather than requiring the consumers to actively retrieve them. The platform also offers a very rich user interface with fine admin and monitoring capabilities, which is a great addition.
Message Queuing Model
RabbitMQ is a unique messaging platform with an architecture that supports both point-to-point and pub-sub messaging patterns. It operates through the use of exchanges, to which producers publish messages. Consumers subscribe to the corresponding queues to receive the messages.
This architecture provides flexibility in terms of message delivery, allowing producers and consumers to use the pattern that best fits their needs. However, the main objective is for it to act as a pub-sub system.
Delivery Guarantee
RabbitMQ does not guarantee exact-once delivery by default. Achieving exact-one delivery of messages is a complex task because it requires sending an acknowledgment to remove a message from the queue. If the processing system were to encounter an issue after receiving the message but before sending the acknowledgment, the message may be processed again, potentially resulting in duplicate processing.
However, RabbitMQ provides several mechanisms for ensuring that messages are delivered in a reliable manner, such as transport transactions.
At-least-once delivery is supported, which makes it reliable with delivery guarantees. One of the key strengths of RabbitMQ is its support for complex message routing, where messages can be routed based on rules and conditions, making it a very reliable and extensible platform.
Ordering Guarantee
In RabbitMQ, messages are stored in queues and can be consumed by multiple consumers. It guarantees the consumption order of messages. As mentioned above, itâs important to note that RabbitMQ requires an acknowledgment before removing a message from the queue. So, if a consumer fails, that message will be at the beginning of the queue again and left for the next consumer to process.
In cases where you want certain consumers to only process certain types of messages, an exchange can be configured with specific filters. Your consumer can then connect to this exchange to ensure that certain message types are always processed in a specified order.
Throughput and Latency
RabbitMQ offers strong guarantees for message delivery and persistence. When messages are produced, they are confirmed only after they have been replicated to a majority of nodes and written to disk with fsync. This provides strong guarantees but can result in slower performance, with the ability to process upwards of 60,000 messages per second with higher latency than its counterparts.
On the other hand, when messages are consumed instantaneously, the read operations happen before they are written to disk, which results in improved performance.
Persistence and Replayability
RabbitMQ follows a standard store-and-forward pattern, allowing messages to be stored in RAM, on disk, or both. To ensure the persistence of messages, the producer can tag them as persistent, and they will be stored in a separate queue. This helps achieve message retention even after a restart or failure of the RabbitMQ server.
However, the system doesnât allow replayability of the messages, as they are removed once the message is acknowledged.
Limitations
RabbitMQ faces difficulties in scaling horizontally, as itâs unable to resolve conflicts in queues that may arise from split-brain scenarios), which can occur during network failures and high-load situations and cause conflicts and inconsistencies between the split groups. Vertical scaling is available as an alternative, but it involves preplanning for capacity, which is not always feasible.
NSQ
NSQ is one of the most popular messaging platforms and the successor to simplequeue. Itâs popular due to its ease of use and efficient handling of high-volume, real-time data streams.
NSQ consists of two main components: nsqd and nsqlookupd. The nsqd (NSQ daemon) is responsible for accepting, storing, and dispatching messages, while nsqlookupd manages topology information and helps to maintain network connections. Additionally, NSQ includes an administrative user interface called nsqadmin for web-based data visualization and tasks.
Message Queuing Model
NSQ consists of topics, each of which contains channels. Every channel receives a duplicate of the messages for a specific topic, making it a pub-sub model for message delivery.
Delivery Guarantee
NSQ is designed with a distributed architecture around the concept of topics, which allows messages to be organized and distributed across the cluster. To ensure reliable delivery, NSQ replicates each message across multiple nodes within the NSQ cluster. This means that if a node fails or thereâs a disruption in the network, the message can still be delivered to its intended recipients.
Ordering Guarantee
NSQ provides a message delivery guarantee; however, it does not guarantee the order of messages that are published to a topic and channel. This means that the order in which messages are received by a consumer may not match the order in which they were published.
The absence of a strict message ordering requirement in NSQ reduces the necessity of establishing interconnections between all instances of nsqd to synchronize and sort messages, leading to improved performance.
Throughput and Latency
NSQ is a high-performance messaging platform that has been successfully used by companies like Twilio to process up to 800,000 messages per second with low latency. While it is a popular choice for certain use cases, NSQ might not be good for use cases that require high throughput because of its relatively low benchmarks compared to other options.
However, depending on your requirements, if you require a feature-rich dashboard with the same throughput, NSQ can be quite a good choice.
Persistence and Replayability
NSQ has limitations in terms of persistence, as it only maintains file durability in cases of low memory or when messages are archived by consumers. In the event of node failure, messages may be lost as NSQ deletes them immediately upon receiving the finish signal from the consumer. There is no way to recover these messages.
Archiving messages is possible using the built-in nsq_to_file utility, but it does not provide any built-in replay functionality.
Limitations
NSQ lacks the capability of replication or clustering, which means that it does not have the ability to create multiple copies of data across different nodes in a network. Moreover, a heartbeat mechanism is used to detect if consumers are alive or dead, which is not an ideal method because it doesnât ensure idempotency.
Kafka
Kafka is one of the leading open-source streaming platforms. Itâs currently governed by the Apache Software Foundation and is written in Scala. The platformâs first public release dates back to January 2011 and was created and used inside LinkedIn as a solution that could handle real-time data streams and process them in near-real time.
Message Queuing Model
The platform was created to serve queues, making it well-suited for use with a publish-subscribe model. Messages or orders are stored in topics with the support of multiple subscribers, allowing a single topic to be consumed by zero or many subscribers.
Delivery Guarantee
Kafka allows consumers to subscribe to topics and receive orders from its partitions. It provides different delivery guarantees for messages, including at-most-once, at-least-once, and exactly-once. The first two are standards across the industry, but the exact-once delivery guarantee is a more advanced feature that can be achieved through the use of idempotent producer and transaction APIs.
Exactly-once delivery provides a higher level of reliability, but it requires a more complex setup and careful management to ensure messages are not processed multiple times in the event of transaction failures. It reads orders in committed transactions by setting isolation.level to read_committed.
Ordering Guarantee
Kafka is highly recommended for applications where maintaining order is a critical requirement, for example, in log aggregation, complex event processing (CEP), and staged event-driven architecture (SEDA).
In these use cases, Kafka provides an ordering guarantee for messages within a partition, ensuring that messages are delivered to consumers in the same order as they were produced. This is achieved by maintaining an ordered sequence of records within a partition, ensuring the exact order of messages is preserved.
Throughput and Latency
A Kafka topic is designed to support multiple partitions, which helps with concurrency and, in turn, increases the throughput of the platform. The partitions inside the topic store messages on disk, and consumers read them in a chronological fashion when theyâre attached to the topic. Because the I/O layer is replaced by a much faster OS page cache, there is a significant increase in throughput when messages are read as they are produced without being written to disk.
Unfortunately, Kafka doesnât state what their throughput is, though some tests indicate it can handle at least 350,000 messages per second. It is, however, highly scalable, and a recent podcast by Confluent explains how one company handles 2 million messages per second.
Persistence and Replayability
Kafka supports persistence and replayability out of the box. The data persistence is maintained by replicating messages across multiple nodes in the cluster, which provides high availability and fault tolerance. The retention period for messages can be configured, and messages within the retention period can be replayed, making it useful for scenarios such as debugging or data recovery. For example, if the period is set for a week, replay scenarios should work for any messages up to a week old.
The durability of messages can be increased by declaring the topic as durable, marking the messages as persistent, and using publisher confirms, which ensure that messages have been successfully written to disk.
Kafkaâs message durability and persistence and ability to persist data even in case of node failures make it a dependable platform and a popular choice for storing and processing real-time data.
Limitations
One of the biggest drawbacks of Apache Kafka is the architecture that makes it so efficient. The combination of brokers and ZooKeeper nodes, along with numerous configurable options, can make it difficult and complex for new teams to set up and manage without encountering performance issues or data loss. However, Kafka can work without ZooKeeper after 3.3.1 version using Kraft improving performance.
Additionally, Kafkaâs architecture is not suitable for use with remote procedure calls (RPCs), as having an RPC in the middle of your process could slow it down if youâre doing it synchronously and waiting for a response. This is because Kafka is designed for fast (or at least evenly performant) consumers and is not optimized for scenarios where individual consumers may have widely varying processing speeds.
These complexities can make it challenging for teams that are new to Kafka to get the most out of the platform, and you may require specialized expertise to set it up and manage it effectively.
Summary: A Comparison Table of NATS, RabbitMQ, NSQ, and Kafka
The below table will help you understand in a summary how different message queuing systems, including NATS, RabbitMQ, NSQ, and Kafka, compare across the different parameters previously discussed:
Message Queuing Model | NATS | RabbitMQ | NSQ | Kafka |
Delivery Guarantee | At-most-once, at-least-once, exactly-once | At-least-once | At-least-once | At-most-once, at-least-once, exactly-once |
Ordering Guarantee | Yes | Yes | No | Yes |
Throughput | Up to 6 million messages per second | Up to 60,000 messages per second | Up to 800,000 messages per second | Up to 2 million messages per second |
Persistence and Replayability | Yes | Persistent, but lacks Replayability | No | Yes |
Limitations | No atomic batch publish | Limited scalability, no replayability | Limited scalability and persistence, no replayability | Complex setup and management, not suitable for RPCs |
Final Thoughts
In this roundup, we explained what different messaging platforms have to offer and how they differ from one another. However, the most important factor in deciding your go-to platform should be your requirements and how experienced your infrastructure team is at maintaining the platform and adapting it for your infrastructure.
At the end of the day, the number one factor in determining the platform that best meets your needs is understanding your requirements. Start with what you want from the platform and work back to the things that are okay for you to sacrifice for your use case.
Kafka has emerged as an industry leader for event streaming use cases, so choosing it can be beneficial for you because of the ecosystem. However, if you are more interested in cloud-native technologies and want a simple solution, NATS can be quite useful. RabbitMQ is suitable for long-lasting tasks and integrating services with a very easy setup, and NSQ can be used for simplicity with some sacrifices on ordering.
Written by Hrittik Roy.