Introduction to Redpanda – 红熊猫简介

最后修改: 2024年 3月 14日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

In this tutorial, we’ll discuss a potent event streaming platform called Redpanda. It’s a competition to the de facto industry streaming platform Kafka and, interestingly, it’s also compatible with the Kafka APIs.

在本教程中,我们将讨论一个名为 Redpanda 的强大事件流平台。它是事实上的行业流平台Kafka的竞争对手,有趣的是,它还兼容 Kafka API。

We’ll look at the key components, features, and use cases of Redpanda, create Java programs for publishing messages to Redpanda topics, and then read messages from it.

我们将了解 Redpanda 的关键组件、功能和用例,创建 Java 程序向 Redpanda 主题发布消息,然后从中读取消息。

2. Redpanda vs. Kafka

2. Redpanda vs. Kafka

Since the makers of Redpanda are claiming to be competition to Kafka, let’s compare them on a few of the important factors:

既然 Redpanda 的制造商声称自己是 Kafka 的竞争对手,那我们就来比较一下它们的几个重要因素:

Feature Redpanda Kafka
Developer Experience
  • Includes a single binary package that is easy to install
  • No dependency on JVM and third-party tools
  • It’s dependent on Zookeeper or KRaft
  • For installation, developers need more expertise
Performance
  • 10 times faster than Kafka due to its thread-per-core programming model
  • Written in C++
  • Can handle one GB/sec of writes for each core
  • Supports automatic kernel tuning
  • p99999 latency is 16ms
  • Kafka was developed a long time ago and hence not optimized for new-age CPUs running multiple cores.
  • Written in Java
  • p99999 latency is 1.8 sec
Cost
  • 6 times lower than Kafka
  • Takes more infrastructure to support similar performances
Connector
  • It’s pretty mature and supports many out-of-the-box connectors
Community  Support
  • In terms of acceptability, a long way to go compared to Kafka
  • Has a Slack channel
  • It has a large adoption in various industries and hence there’s an extremely mature community available

3. Redpanda Architecture

3.红熊猫架构

Redpanda’s architecture is not only simple but extremely easy to grasp. Interestingly, it has a single binary installation package that’s easy to install. This gives the developers a quick headstart, hence the reason for its popularity. Moreover, it delivers an extremely high-performing streaming platform with a great throughput.

Redpanda 的架构不仅简单,而且非常容易掌握。有趣的是,它只有一个二进制安装包,易于安装。这为开发人员提供了快速入门的机会,也是它广受欢迎的原因。此外,它还提供了一个性能极高、吞吐量极大的流媒体平台。

3.1. Key Components and Features

3.1.主要组件和功能

Let’s dive into the key components and features of Redpanda that make it extremely robust and performant:

让我们深入了解 Redpanda 的关键组件和功能,它们使 Redpanda 变得异常强大、性能卓越:

 

The control plane supports Kafka API for managing the broker, creating messaging topics, publishing and consuming messages, and much more. Hence, the legacy systems relying on Kafka can migrate to Redpanda with significantly less effort. However, there’s a different set of Admin APIs for managing and configuring the Redpanda cluster.

控制平面支持用于管理代理、创建消息主题、发布和消费消息等的 Kafka API。因此,依赖于 Kafka 的传统系统迁移到 Redpanda 的难度大大降低。不过,管理和配置 Redpanda 集群需要一套不同的管理 API。

Redpanda supports tiered storage. This means we can configure it to offload or archive its data logs from its local cache to a cheaper object storage in the cloud. Also, on-demand from the consumers, the data is moved back to the local cache from the remote object storage in real time.

Redpanda支持分层存储。这意味着我们可以配置它将数据日志从本地缓存卸载或归档到云中更便宜的对象存储。此外,根据消费者的需求,数据会实时从远程对象存储移回本地缓存。

Redpanda has a Raft consensus algorithm implementation layer that replicates topic-partition data across its nodes. This feature prevents data loss in the event of a failure. Naturally, it guarantees high data safety and fault tolerance.

Redpanda拥有一个Raft共识算法实现层,可在节点间复制主题分区数据。该功能可防止故障情况下的数据丢失。当然,这也保证了较高的数据安全性和容错性。

Redpanda has robust authentication and authorization support. It can authenticate external users and applications using methods such as SASL, OAuth, OpenID Connect (OIDC), basic authentication, Kerberos, and others. Additionally, it enables fine-grained access control over its resources through the Role Based Access Control (RBAC) mechanism.

Redpanda 拥有强大的身份验证和授权支持。它可以使用 SASL、OAuth、OpenID Connect (OIDC)、基本身份验证、Kerberos 等方法对外部用户和应用程序进行身份验证。此外,它还可通过基于角色的访问控制(RBAC)机制对其资源进行细粒度访问控制。

Schemas are essential in defining the data exchanged between the Redpanda broker, consumers, and producers. Hence, the cluster has a Schema Registry. The Schema Registry API helps register and modify the schemas.

模式对于定义 Redpanda 代理、消费者和生产者之间的数据交换至关重要。因此,集群有一个模式注册中心模式注册 API 可帮助注册和修改模式。

The HTTP Proxy (pandaproxy) API provides a convenient way to interact with Redpanda for basic data operations like listing topics and brokers, getting events, producing events, and much more.

HTTP 代理 (pandaproxy) API 提供了与 Redpanda 进行基本数据交互的便捷方式,如列出主题和经纪人、获取事件、生成事件等。

Finally, Redpanda provides metric endpoints for its monitoring. These can be configured on Prometheus (monitoring tool) to pull important metrics and show them on Grafana dashboards.

最后,Redpanda 为其监控提供了指标端点。这些端点可在 Prometheus(监控工具)上进行配置,以提取重要指标并将其显示在 Grafana 面板上。

3.2. Single Binary Installation Package

3.2.单一二进制安装包

Redpanda’s installation package comprises a single binary, hence its installation is significantly simpler than Kafka. Unlike Kafka, it’s not dependent on a JVM or a cluster manager like Zookeeper. Due to these factors, operating Redpanda is remarkably easy.

Redpanda的安装包由一个二进制文件组成,因此其安装比Kafka简单得多。与 Kafka 不同,它不依赖于 JVM 或像 Zookeeper 这样的集群管理器。由于这些因素,Redpanda 的操作非常简单。

It’s developed in C++ and has a compelling thread-per-core programming model that helps utilize the CPU cores, memory, and network optimally. Consequently, the hardware cost for its deployment is significantly reduced. This model also results in low latency and high throughput.

它是用 C++ 开发的,具有引人注目的每核线程编程模型,有助于优化利用 CPU 内核、内存和网络。因此,其部署的硬件成本大大降低。这种模式还能实现低延迟和高吞吐量。

Redpanda’s cluster comprises multiple nodes. Each node can be either a data plane or a control plane. All these nodes need is a single binary package installed on them with the appropriate configurations. If the nodes have high-end computing power, they can play both roles without performance bottlenecks.

Redpanda 集群由多个节点组成。每个节点既可以是数据平面,也可以是控制平面。这些节点只需安装一个二进制软件包,并进行适当的配置即可。如果节点拥有高端计算能力,它们就能同时扮演这两种角色,而不会出现性能瓶颈。

3.3. Management Tools

3.3 管理工具

Redpanda provides two management tools, a Web Console and a CLI called Redpanda Keeper (RPK). The Console is a user-friendly web application that cluster administrators can use.

Redpanda提供两种管理工具,一种是Web控制台,另一种是名为Redpanda Keeper (RPK)的CLI。控制台是集群管理员可以使用的用户友好型 Web 应用程序。

RPK is mostly used for low-level cluster management and tuning. However, the Console provides visibility into data streams and the capability to troubleshoot and manage the cluster.

RPK 主要用于低级集群管理和调整。不过,控制台提供了数据流的可视性,以及排除故障和管理群集的能力。

4. Deployment

4.部署

Redpanda supports Self-hosted and Redpanda Cloud deployment.

Redpanda 支持自托管和 Redpanda 云部署

In Self-hosted deployment, customers can deploy the Redpanda cluster inside their private data centers or in their VPCs in the public cloud. It can be deployed on physical or virtual machines and Kubernetes. As a rule of thumb, each broker should have its dedicated node. Currently, RHEL/CentOS and Ubuntu operating systems are supported.

在自托管部署中,客户可以在其私有数据中心或公共云的 VPC 中部署 Redpanda 集群。它可以部署在物理机或虚拟机和 Kubernetes 上。根据经验,每个代理都应有自己的专用节点。目前支持 RHEL/CentOS 和 Ubuntu 操作系统。

Additionally, AWS Simple Storage Service (S3), Azure Blob Storage (ABS), and Google Cloud Storage (GCS) can be used for supporting tiered storage.

此外,AWS Simple Storage Service (S3)、Azure Blob Storage (ABS) 和 Google Cloud Storage (GCS) 也可用于支持分层存储。

Interestingly, customers can also opt for Redpanda Cloud for managed services. They can either have the whole cluster completely on Redpanda Cloud or choose to own the data plane running in their private data centers or public cloud accounts. The control plane remains on the Redpanda Cloud where monitoring, provisioning, and upgrades are all taken care of.

有趣的是,客户还可以选择Redpanda云管理服务。他们既可以将整个集群完全放在Redpanda云上,也可以选择拥有运行在其私有数据中心或公共云账户中的数据平面。控制平面仍在 Redpanda 云上,由 Redpanda 云负责监控、配置和升级。

5. Key Use Cases

5.关键用例

Unlike Kafka, Redpanda is an extremely robust streaming platform for developers because of its simple architecture and ease of installation. Let’s quickly look at the use case along the same lines:

与 Kafka 不同,Redpanda 的架构简单、安装方便,因此对于开发人员来说是一个非常强大的流媒体平台。让我们沿着同样的思路快速查看一下用例:

In general, the participants in a streaming platform are:

一般来说,流媒体平台的参与者包括

  • Source systems generate feeds
  • Feeds could be monitoring events, metrics, notifications, and more
  • Brokers in the cluster managing the topics
  • Producers read feeds from source systems and publish them to the topics
  • Consumers constantly poll on the subscribed topics
  • Target Systems receive the transformed messages from the consumers

Redpanda guarantees the delivery of live feeds from various sources like monitoring tools, compliance and security platforms, IoT devices, and others to target systems with an incredibly 10x lower average latency.

Redpanda 保证将监控工具、合规和安全平台、物联网设备等各种来源的实时信息传送到目标系统,平均延迟时间降低了 10 倍,令人难以置信。

It supports the consumer and producer model for processing live feeds or events from various sources. The producers are applications that read data from source systems and publish it to topics in the Redpanda cluster. The brokers in the cluster are highly reliable and fault-tolerant, guaranteeing message delivery.

它支持消费者和生产者模式,用于处理来自不同来源的实时馈送或事件。生产者是从源系统读取数据并将其发布到 Redpanda 集群中的主题的应用程序。集群中的代理具有高可靠性和容错性,可保证消息的传递。

The consumer applications subscribe to the topics in the cluster. Eventually, they read the data from the topics and, after further transforming the data, send them to various target systems like analytics platforms, NoSQL databases, relational databases, or other streaming platforms.

消费者应用程序订阅群集中的主题。最终,它们从主题中读取数据,并在进一步转换数据后将其发送到各种目标系统,如分析平台、NoSQL 数据库、关系数据库或其他流平台。

In Microservice architecture, Redpanda helps decouple microservices by facilitating asynchronous communication between them.

在微服务架构中,Redpanda 通过促进微服务之间的异步通信来帮助它们解耦。

Consequently, it can play a substantial role across industries in developing:

因此,它可以在各行各业的发展中发挥重要作用:

  • Observability platforms for event and log processing, reporting, troubleshooting, and auto-healing
  • Real-time compliance and fraud-detection systems
  • Real-time analytic dashboards and applications

6. Implement Redpanda Client With Kafka API

6.利用 Kafka API 实现 Redpanda 客户端

Notably, Redpanda supports the Kafka API. Hence, we’ll use the Kafka client to write programs that can interact with the Redpanda Stream.

值得注意的是,Redpanda支持Kafka API。因此,我们将使用Kafka客户端编写能与Redpanda Stream交互的程序。

For our examples, we’ve used Java Testcontainers to deploy a single-node Redpanda on a Windows desktop.

在我们的示例中,我们使用 Java Testcontainers 在 Windows 桌面上部署单节点 Redpanda。

Furthermore, we’ll explore fundamental programs covering topic creation, message publishing, and message consumption. This is just for demonstration purposes and, hence, we won’t delve deeply into the Kafka API concepts.

此外,我们将探索涵盖主题创建、消息发布和消息消费的基本程序。这只是为了演示目的,因此我们不会深入探讨 Kafka API 的概念。

6.1. Prerequisites

6.1.前提条件

Before we begin, let’s import the necessary Maven dependency for the Kafka client library:

在开始之前,让我们为 Kafka 客户端库导入必要的 Maven 依赖关系

<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>3.6.1</version>
</dependency>

6.2. Create Topic

6.2.创建主题

For creating a topic on Redpanda, we’ll first instantiate the AdminClient class from the Kafka client library:

要在 Redpanda 上创建一个主题,我们首先要实例化 Kafka 客户端库中的 AdminClient 类:

AdminClient createAdminClient() {
    Properties adminProps = new Properties();
    adminProps.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, getBrokerUrl());
    return KafkaAdminClient.create(adminProps);
}

To set up the AdminClient, we got the broker URL and passed it to its static create() method.

要设置 AdminClient ,我们需要获取代理 URL 并将其传递给静态 create() 方法。

Now, let’s see how we create a topic:

现在,让我们看看如何创建一个主题:

void createTopic(String topicName) {

    try (AdminClient adminClient = createAdminClient()) {
        NewTopic topic = new NewTopic(topicName, 1, (short) 1);
        adminClient.createTopics(Collections.singleton(topic));
    } catch (Exception e) {
        LOGGER.error("Error occurred during topic creation:", e);
    }
}

The createTopics() method of the AdminClient class takes in the NewTopic object as an argument for creating a topic.

AdminClient 类的 createTopics() 方法将 NewTopic 对象作为创建主题的参数。

Finally, let’s take a look at the createTopic() method in action:

最后,让我们来看看 createTopic() 方法的实际操作:

@Test
void whenCreateTopic_thenSuccess() throws ExecutionException, InterruptedException {
    String topic = "test-topic";
    createTopic(topic);
    try(AdminClient adminClient = createAdminClient()) {
        assertTrue(adminClient.listTopics()
          .names()
          .get()
          .contains(topic));
    }
}

The program creates the topic test-topic successfully on Redpanda. We also validate the presence of the topic in the broker with the method listTopics() of the AdminClient class.

程序在 Redpanda 上成功创建了主题 test-topic。我们还使用 AdminClient 类的 listTopics() 方法验证了该主题是否存在于代理中。

6.3. Publish Message to a Topic

6.3.向主题发布信息

Understandably, the most basic requirement of a producer application is publishing messages to a topic. For this purpose, we’ll use a KafkaProducer:

可以理解的是,生产者应用程序最基本的要求就是向主题发布消息。为此,我们将使用 KafkaProducer

KafkaProducer<String, String> createProducer() {
    Properties producerProps = new Properties();
    producerProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, getBrokerUrl());
    producerProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
    producerProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

    return new KafkaProducer<String, String>(producerProps);
}

We instantiated the producer by supplying essential properties like the broker URL and the StringSerializer class to the KafkaProducer constructor.

我们通过向 KafkaProducer 构造函数提供基本属性(如代理 URL 和 StringSerializer 类)来实例化生产者。

Now, let’s use the producer to publish the messages to a topic:

现在,让我们使用生产者将消息发布到主题中:

void publishMessage(String msgKey, String msg, String topic, KafkaProducer<String, String> producer)
    throws ExecutionException, InterruptedException {
    ProducerRecord<String, String> record = new ProducerRecord<>(topic, msgKey, msg);
    producer.send(record).get();
}

After creating the ProducerRecord object, we pass it to the send() method in KafkaProducer object to publish the message. The send() method operates asynchronously and, hence, we call the method get() to ensure blocking until the message is published.

创建 ProducerRecord 对象后,我们将其传递给 KafkaProducer 对象中的 send() 方法,以发布消息。send() 方法是异步操作的,因此我们调用 get() 方法来确保阻塞,直到消息发布。

Finally, now, let’s publish a message:

最后,现在让我们发布一条信息:

@Test
void givenTopic_whenPublishMsg_thenSuccess() {
    try (final KafkaProducer<String, String> producer = createProducer()) {
        assertDoesNotThrow(() -> publishMessage("test_msg_key_2", "Hello Redpanda!", "baeldung-topic", producer));
    }
}

First, we create the KafkaProducer object by invoking the method createProducer(). Then we publish the message “Hello Redpanda!” to the topic baeldung-topic by calling the method publishMessage() that we covered earlier.

首先,我们通过调用方法 createProducer() 创建 KafkaProducer 对象。然后,我们通过调用方法 publishMessage() 将消息 “Hello Redpanda!” 发布到主题 baeldung-topic 中。

6.4. Consume Message From a Topic

6.4.从主题消费信息

As a next step, we’ll first create a KafkaConsumer before we can consume the messages from the stream:

下一步,我们将首先创建一个 KafkaConsumer 从流中消费消息:

KafkaConsumer<String, String> createConsumer() {
    Properties consumerProps = new Properties();
    consumerProps.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
    consumerProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, getBrokerUrl());
    consumerProps.put(ConsumerConfig.GROUP_ID_CONFIG, "test-consumer-group");
    consumerProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
    consumerProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());

    return new KafkaConsumer<String, String>(consumerProps);
}

We instantiate the consumer by providing essential properties like the broker URL, StringDeSerializer class, and others to the KafkaConsumer constructor. Additionally, we ensure that the consumer would consume messages from the offset 0 (“earliest”).

我们通过向 KafkaConsumer 构造函数提供诸如代理 URL、StringDeSerializer 类等基本属性来实例化消费者。此外,我们还要确保消费者从偏移量 0(”最早”)开始消费消息。

Moving on, let’s consume some messages:

接下来,我们来消费一些信息:

@Test
void givenTopic_whenConsumeMessage_thenSuccess() {

    try (KafkaConsumer<String, String> kafkaConsumer = createConsumer()) {
        kafkaConsumer.subscribe(Collections.singletonList(TOPIC_NAME));

        while(true) {
            ConsumerRecords<String, String> records = kafkaConsumer.poll(Duration.ofMillis(1000));
            if(records.count() == 0) {
                continue;
            }
            assertTrue(records.count() >= 1);
            break;
        }
    }
}

The method, after creating a KafkaConsumer object, subscribes to a topic. Then, it polls on it for every 1000 ms to read messages from it. Here, for demonstration, we’re coming out of the loop, but in the real world, applications continuously poll for the messages and then process them further.

该方法在创建 KafkaConsumer 对象后,会订阅一个主题。然后,它每 1000 毫秒轮询一次,从中读取消息。在这里,为了演示,我们跳出了循环,但在现实世界中,应用程序会持续轮询消息,然后进一步处理它们。

7. Conclusion

7.结论

In this tutorial, we explored the Redpanda Streaming platform. Conceptually, it’s similar to Apache Kafka but much easier to install, monitor, and manage. Additionally, with less computing and memory resources, it can achieve extremely high performance with high fault tolerance.

在本教程中,我们探讨了 Redpanda Streaming 平台。从概念上讲,它与 Apache Kafka 相似,但更易于安装、监控和管理。此外,只需较少的计算和内存资源,它就能实现极高的性能和容错能力。

However, Redpanda still has a considerable distance to cover in terms of industry adoption when compared to Kafka. Additionally, the community support for Redpanda is not as strong as that for Kafka.

然而,与 Kafka 相比,Redpanda 在行业应用方面仍有相当大的差距。此外,Redpanda 的社区支持力度也不如 Kafka。

Finally, applications can migrate to Redpanda from Kafka with considerably less effort because it’s compatible with Kafka API.

最后,由于 Redpanda 与 Kafka API 兼容,应用程序可以轻松地从 Kafka 迁移到 Redpanda。

As usual, the code used in this article is available over on GitHub.

与往常一样,本文中使用的代码可在 GitHub 上获取。