Messaging platforms can handle asynchronous message management and process large amounts of data with high throughput, greatly improving data delivery reliability. However, with two different types of platforms—message queuing (point-to-point) or streaming (pub-sub) depending on the receiver and how the message is handled—and so many different messaging platforms popping up, choosing one can be a challenge.
In this roundup, you’ll learn the difference between the two types and compare four leading options: NATS, RabbitMQ, NSQ, and Kafka.
Messaging platforms provide a way for microservices to communicate and exchange data reliably and efficiently. This reliable communication between services architecture is crucial for completing tasks in microservice, as each service is responsible for managing one specific function.
First, let’s look at the two main types of messaging platforms.
Message queuing, also known as point-to-point communication, involves sending messages to a queue, where they are stored until the intended recipient processes them. The sender and receiver do not need to be connected at the same time, facilitating asynchronous communication. The messages are stored in the queue until they can be delivered, and they are generally deleted after that.
Generally, queuing is appropriate for use cases where there’s no requirement for real-time delivery or processing, such as batch processing, order fulfillment, or offline data processing.
Streaming, or the publish-subscribe (pub-sub) model, has the capability to route messages to multiple queues that can be consumed by multiple consumers, who perform various operations based on the content of the messages. Message creators are called publishers or producers, message senders are called receivers or consumers, and messages are stored in files or topics.
Consumers can subscribe to a log file or topic to receive a continuous stream of messages stored within. These messages can be accessed in real time or from any point in history, unlike queues, where messages are removed. Streaming is commonly used in real-time applications such as IoT and stock market order processing.
This article compares four major messaging platforms while focusing on the following five characteristics:
These factors will determine how the platform can be used in your workflow and whether it’s suitable for your use case.
NATS, or Neural Autonomic Transport System, is a cloud-native messaging platform that was first released in 2011. It was originally written in Ruby but has since been rewritten in Go to support increased throughput. With a minimal footprint of under 10 MB, it’s a lightweight solution with a very straightforward setup process. It’s suitable for edge deployments and can be deployed anywhere, including using VMs and containers on a device.
NATS follows a pub-sub model with extended support for models like request-reply and wildcard subscriptions.
The messaging platform offers at-most-once delivery, which makes it suitable for real-time, high-speed communication between microservices or other applications but not for long-term event storage. The at-most-once delivery guarantee means that messages are delivered once, and if there is a failure in the delivery process, the message is lost.
This model is suitable for use cases where the loss of a few messages is acceptable as long as the overall system continues to operate correctly. It works particularly well in use cases that prioritize real-time communication over long-term data storage and persistence. For example, in a game like World of Warcraft, which has over 2 million players daily, it’s important that players know where they are in the world, but not necessarily where they were a few seconds ago. NATS is a good messaging tool for this kind of data volume that needs to be handled every second, so it’s unsurprising that the game’s developer, Activision Blizzard, uses it.
Messages from a single publisher are delivered in the order they’re created to the subscriber. For multiple publishers, messages are out of order for different subscribers, and there’s no guarantee that messages will be delivered in the order they were published.
While out-of-order message delivery can be limiting in some use cases that require strict message ordering, it offers high performance and scalability benefits. For instance, you need message ordering to build control systems in a factory setting, but whether temperature readings from different sensors are delivered in a particular order is less critical.
For temperature, you need the latest measurement to be delivered on time, but it’s not important that an old reading is received last. By including timestamps in the message so that older timestamps can just be discarded, you can mitigate the ordering challenge in cases where it isn’t needed.
NATS is known for its high performance, low latency, and emphasis on simplicity after it was rewritten in Go. Its rewrite in Go makes NATS an ideal choice for demanding and real-time applications and has increased its throughput compared to its original Ruby implementation.
NATS has been shown in many benchmarks to be faster and more efficient than other messaging systems, with the ability to manage 3 million messages per second.
NATS primarily stores messages in memory, which means there is a risk of data loss if the server is stopped. Upon restarting the server, running applications will no longer receive messages, and attempts to publish new messages will result in an error indicating an invalid publish request. This is because the server restart will erase any previous connection information.
However, to address the issue of missed messages, NATS offers a way to “catch up” on missed messages based on time, the number of messages you want to replay, or the sequence number from where you want to replay.
While NATS is an excellent messaging solution for certain use cases due to its high throughput and low latency, its lack of ordering and delivery assurance, as well as its lack of persistence, can make it unsuitable for other use cases where those features are essential.
For example, in use cases where message order is critical, such as financial transactions or stock trading, or in cases where message loss is not acceptable, such as in the case of an air traffic control system, NATS’s lack of ordering, delivery assurance, and persistence could be problematic. However, you need to choose a message queue that is specific to your needs, as one size doesn’t fit all, and the limitations here are just tradeoffs and design decisions.
RabbitMQ is a popular open-source messaging broker written in Erlang that has been available since 2007. It implements the AMQP 0-9-1 model, a technology-agnostic protocol that supports interoperability between different systems.
It operates on a push-based message delivery system, where the broker sends messages to consumers rather than requiring the consumers to actively retrieve them. The platform also offers a very rich user interface with fine admin and monitoring capabilities, which is a great addition.
RabbitMQ is a unique messaging platform with an architecture that supports both point-to-point and pub-sub messaging patterns. It operates through the use of exchanges, to which producers publish messages. Consumers subscribe to the corresponding queues to receive the messages.
This architecture provides flexibility in terms of message delivery, allowing producers and consumers to use the pattern that best fits their needs. However, the main objective is for it to act as a pub-sub system.
RabbitMQ does not guarantee exact-once delivery by default. Achieving exact-one delivery of messages is a complex task because it requires sending an acknowledgment to remove a message from the queue. If the processing system were to encounter an issue after receiving the message but before sending the acknowledgment, the message may be processed again, potentially resulting in duplicate processing.
However, RabbitMQ provides several mechanisms for ensuring that messages are delivered in a reliable manner, such as transport transactions.
At-least-once delivery is supported, which makes it reliable with delivery guarantees. One of the key strengths of RabbitMQ is its support for complex message routing, where messages can be routed based on rules and conditions, making it a very reliable and extensible platform.
In RabbitMQ, messages are stored in queues and can be consumed by multiple consumers. It guarantees the consumption order of messages. As mentioned above, it’s important to note that RabbitMQ requires an acknowledgment before removing a message from the queue. So, if a consumer fails, that message will be at the beginning of the queue again and left for the next consumer to process.
In cases where you want certain consumers to only process certain types of messages, an exchange can be configured with specific filters. Your consumer can then connect to this exchange to ensure that certain message types are always processed in a specified order.
RabbitMQ offers strong guarantees for message delivery and persistence. When messages are produced, they are confirmed only after they have been replicated to a majority of nodes and written to disk with fsync. This provides strong guarantees but can result in slower performance, with the ability to process upwards of 60,000 messages per second with higher latency than its counterparts.
On the other hand, when messages are consumed instantaneously, the read operations happen before they are written to disk, which results in improved performance.
RabbitMQ follows a standard store-and-forward pattern, allowing messages to be stored in RAM, on disk, or both. To ensure the persistence of messages, the producer can tag them as persistent, and they will be stored in a separate queue. This helps achieve message retention even after a restart or failure of the RabbitMQ server.
However, the system doesn’t allow replayability of the messages, as they are removed once the message is acknowledged.
RabbitMQ faces difficulties in scaling horizontally, as it’s unable to resolve conflicts in queues that may arise from split-brain scenarios), which can occur during network failures and high-load situations and cause conflicts and inconsistencies between the split groups. Vertical scaling is available as an alternative, but it involves preplanning for capacity, which is not always feasible.
NSQ is one of the most popular messaging platforms and the successor to simplequeue. It’s popular due to its ease of use and efficient handling of high-volume, real-time data streams.
NSQ consists of two main components: nsqd and nsqlookupd. The nsqd (NSQ daemon) is responsible for accepting, storing, and dispatching messages, while nsqlookupd manages topology information and helps to maintain network connections. Additionally, NSQ includes an administrative user interface called nsqadmin for web-based data visualization and tasks.
NSQ consists of topics, each of which contains channels. Every channel receives a duplicate of the messages for a specific topic, making it a pub-sub model for message delivery.
NSQ is designed with a distributed architecture around the concept of topics, which allows messages to be organized and distributed across the cluster. To ensure reliable delivery, NSQ replicates each message across multiple nodes within the NSQ cluster. This means that if a node fails or there’s a disruption in the network, the message can still be delivered to its intended recipients.
NSQ provides a message delivery guarantee; however, it does not guarantee the order of messages that are published to a topic and channel. This means that the order in which messages are received by a consumer may not match the order in which they were published.
The absence of a strict message ordering requirement in NSQ reduces the necessity of establishing interconnections between all instances of nsqd to synchronize and sort messages, leading to improved performance.
NSQ is a high-performance messaging platform that has been successfully used by companies like Twilio to process up to 800,000 messages per second with low latency. While it is a popular choice for certain use cases, NSQ might not be good for use cases that require high throughput because of its relatively low benchmarks compared to other options.
However, depending on your requirements, if you require a feature-rich dashboard with the same throughput, NSQ can be quite a good choice.
NSQ has limitations in terms of persistence, as it only maintains file durability in cases of low memory or when messages are archived by consumers. In the event of node failure, messages may be lost as NSQ deletes them immediately upon receiving the finish signal from the consumer. There is no way to recover these messages.
Archiving messages is possible using the built-in nsq_to_file utility, but it does not provide any built-in replay functionality.
NSQ lacks the capability of replication or clustering, which means that it does not have the ability to create multiple copies of data across different nodes in a network. Moreover, a heartbeat mechanism is used to detect if consumers are alive or dead, which is not an ideal method because it doesn’t ensure idempotency.
Kafka is one of the leading open-source streaming platforms. It’s currently governed by the Apache Software Foundation and is written in Scala. The platform’s first public release dates back to January 2011 and was created and used inside LinkedIn as a solution that could handle real-time data streams and process them in near-real time.
The platform was created to serve queues, making it well-suited for use with a publish-subscribe model. Messages or orders are stored in topics with the support of multiple subscribers, allowing a single topic to be consumed by zero or many subscribers.
Kafka allows consumers to subscribe to topics and receive orders from its partitions. It provides different delivery guarantees for messages, including at-most-once, at-least-once, and exactly-once. The first two are standards across the industry, but the exact-once delivery guarantee is a more advanced feature that can be achieved through the use of idempotent producer and transaction APIs.
Exactly-once delivery provides a higher level of reliability, but it requires a more complex setup and careful management to ensure messages are not processed multiple times in the event of transaction failures. It reads orders in committed transactions by setting isolation.level to read_committed.
Kafka is highly recommended for applications where maintaining order is a critical requirement, for example, in log aggregation, complex event processing (CEP), and staged event-driven architecture (SEDA).
In these use cases, Kafka provides an ordering guarantee for messages within a partition, ensuring that messages are delivered to consumers in the same order as they were produced. This is achieved by maintaining an ordered sequence of records within a partition, ensuring the exact order of messages is preserved.
A Kafka topic is designed to support multiple partitions, which helps with concurrency and, in turn, increases the throughput of the platform. The partitions inside the topic store messages on disk, and consumers read them in a chronological fashion when they’re attached to the topic. Because the I/O layer is replaced by a much faster OS page cache, there is a significant increase in throughput when messages are read as they are produced without being written to disk.
Unfortunately, Kafka doesn’t state what their throughput is, though some tests indicate it can handle at least 350,000 messages per second. It is, however, highly scalable, and a recent podcast by Confluent explains how one company handles 2 million messages per second.
Kafka supports persistence and replayability out of the box. The data persistence is maintained by replicating messages across multiple nodes in the cluster, which provides high availability and fault tolerance. The retention period for messages can be configured, and messages within the retention period can be replayed, making it useful for scenarios such as debugging or data recovery. For example, if the period is set for a week, replay scenarios should work for any messages up to a week old.
The durability of messages can be increased by declaring the topic as durable, marking the messages as persistent, and using publisher confirms, which ensure that messages have been successfully written to disk.
Kafka’s message durability and persistence and ability to persist data even in case of node failures make it a dependable platform and a popular choice for storing and processing real-time data.
One of the biggest drawbacks of Apache Kafka is the architecture that makes it so efficient. The combination of brokers and ZooKeeper nodes, along with numerous configurable options, can make it difficult and complex for new teams to set up and manage without encountering performance issues or data loss. However, Kafka can work without ZooKeeper after 3.3.1 version using Kraft improving performance.
Additionally, Kafka’s architecture is not suitable for use with remote procedure calls (RPCs), as having an RPC in the middle of your process could slow it down if you’re doing it synchronously and waiting for a response. This is because Kafka is designed for fast (or at least evenly performant) consumers and is not optimized for scenarios where individual consumers may have widely varying processing speeds.
These complexities can make it challenging for teams that are new to Kafka to get the most out of the platform, and you may require specialized expertise to set it up and manage it effectively.
The below table will help you understand in a summary how different message queuing systems, including NATS, RabbitMQ, NSQ, and Kafka, compare across the different parameters previously discussed:
|Message Queuing Model||NATS||RabbitMQ||NSQ||Kafka|
|Delivery Guarantee||At-most-once||At-least-once||At-least-once||At-most-once, At-least-once, Exactly-once|
|Throughput||Up to 3 million messages per second||Up to 60,000 messages per second||Up to 800,000 messages per second||Up to 2 million messages per second|
|Persistence and Replayability||No||Persistent, but lacks Replayability||No||Yes|
|Limitations||Limited ordering and delivery assurance, limited persistence||Limited scalability, no replayability||Limited scalability and persistence, no replayability||Complex setup and management, not suitable for RPCs|
In this roundup, we explained what different messaging platforms have to offer and how they differ from one another. However, the most important factor in deciding your go-to platform should be your requirements and how experienced your infrastructure team is at maintaining the platform and adapting it for your infrastructure.
At the end of the day, the number one factor in determining the platform that best meets your needs is understanding your requirements. Start with what you want from the platform and work back to the things that are okay for you to sacrifice for your use case.
Kafka has emerged as an industry leader for event streaming use cases, so choosing it can be beneficial for you because of the ecosystem. However, if you are more interested in cloud-native technologies and want a simple solution, NATS can be quite useful. RabbitMQ is suitable for long-lasting tasks and integrating services with a very easy setup, and NSQ can be used for simplicity with some sacrifices on ordering.
Written by Hrittik Roy.