1. Overview
1.概述
In this tutorial, we’ll learn the basics of Kafka – the use cases and core concepts anyone should know. We can then find and understand more detailed articles about Kafka.
在本教程中,我们将学习 Kafka 的基础知识–任何人都应该知道的用例和核心概念。然后,我们可以查找并理解有关 Kafka 的更多详细文章。
2. What Is Kafka?
2.什么是卡夫卡?
Kafka is an open-source stream processing platform developed by the Apache Software Foundation. We can use it as a messaging system to decouple message producers and consumers, but in comparison to “classical” messaging systems like ActiveMQ, it is designed to handle real-time data streams and provides a distributed, fault-tolerant, and highly scalable architecture for processing and storing data.
Kafka 是由 Apache 软件基金会开发的开源流处理平台。我们可以将它用作一个消息传递系统来解耦消息生产者和消费者,但与 ActiveMQ 等 “经典 “消息传递系统相比,它是专为处理实时数据流而设计的,并为处理和存储数据提供了一个分布式、容错和高度可扩展的架构。
Therefore, we can use it in various use cases:
因此,我们可以将其用于各种用途:
- Real-time data processing and analytics
- Log and event data aggregation
- Monitoring and metrics collection
- Clickstream data analysis
- Fraud detection
- Stream processing in big data pipelines
3. Setup A Local Environment
3.设置本地环境
If we deal with Kafka for the first time, we might like to have a local installation to experience its features. We could get this quickly with the help of Docker.
如果我们是第一次使用 Kafka,我们可能需要在本地安装以体验其功能。我们可以借助 Docker 快速实现这一点。
3.1. Install Kafka
3.1 安装 Kafka
We download an existing image and run a container instance with this command:
我们下载现有映像,并使用此命令运行一个容器实例:
docker run -p 9092:9092 -d bashj79/kafka-kraft
This will make the so-called Kafka broker available on the host system at port 9092. Now, we would like to connect to the broker using a Kafka client. There are multiple clients that we can use.
这将使所谓的 Kafka 代理在主机系统上的 9092 端口可用。现在,我们要使用 Kafka 客户端连接到代理。我们可以使用多种客户端。
3.2. Use Kafka CLI
3.2 使用 Kafka CLI
The Kafka CLI is part of the installation and is available within the Docker container. We can use it by connecting to the container’s bash.
Kafka CLI 是安装的一部分,可在 Docker 容器中使用。我们可以通过连接到容器的 bash 来使用它。
First, we need to find out the container’s name with this command:
首先,我们需要用这条命令找出容器的名称:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7653830053fa bashj79/kafka-kraft "/bin/start_kafka.sh" 8 weeks ago Up 2 hours 0.0.0.0:9092->9092/tcp awesome_aryabhata
In this sample, the name is awesome_aryabhata. We then connect to the bash using:
在这个示例中,名称是 awesome_aryabhata。然后,我们使用连接到 bash:
docker exec -it awesome_aryabhata /bin/bash
Now, we can, for example, create a topic (we’ll clarify this term later) and list all existing topics with this commands:
现在,我们可以使用该命令创建一个主题(稍后我们将对该术语进行说明)并列出所有现有主题:
cd /opt/kafka/bin
# create topic 'my-first-topic'
sh kafka-topics.sh --bootstrap-server localhost:9092 --create --topic my-first-topic --partitions 1 --replication-factor 1
# list topics
sh kafka-topics.sh --bootstrap-server localhost:9092 --list
# send messages to the topic
sh kafka-console-producer.sh --bootstrap-server localhost:9092 --topic my-first-topic
>Hello World
>The weather is fine
>I love Kafka
3.3. Use Offset Explorer
3.3.使用偏移资源管理器
The Offset Explorer (formerly: Kafka Tool) is a GUI application for managing Kafka. We can download and install it quickly. Then, we create a connection and specify the host and port of the Kafka broker:
Offset资源管理器(原名:Kafka 工具)是一款用于管理 Kafka 的图形用户界面应用程序。我们可以下载并快速安装它。然后,我们创建一个连接,并指定 Kafka 代理的主机和端口:
Then, we can explore the architecture:
然后,我们就可以探索建筑了:
3.4. Use UI for Apache Kafka (Kafka UI)
3.4.使用 Apache Kafka 的用户界面(Kafka UI)
The UI for Apache Kafka (Kafka UI) is a web UI, implemented with Spring Boot and React, and provided as a Docker container for a simple installation with the following command:
Apache Kafka 的用户界面(Kafka UI)是一个 Web 用户界面,使用 Spring Boot 和 React 实现,并以 Docker 容器的形式提供,可通过以下命令进行简单安装:
docker run -it -p 8080:8080 -e DYNAMIC_CONFIG_ENABLED=true provectuslabs/kafka-ui
We can then open the UI in the browser using http://localhost:8080 and define a cluster, as this picture shows:
然后,我们可以使用 http://localhost:8080 在浏览器中打开用户界面,并定义一个群集,如图所示:
Because the Kafka broker runs in a different container than the Kafka UI’s backend, it will not have access to localhost:9092. We could instead address the host system using host.docker.internal:9092, but this is just the bootstrapping URL.
由于 Kafka 代理运行在与 Kafka UI 后端不同的容器中,因此它无法访问 localhost:9092。我们可以使用host.docker.internal:9092来访问主机系统,但这只是引导 URL。
Unfortunately, Kafka itself will return a response that leads to a redirection to localhost:9092 again, which won’t work. If we do not want to configure Kafka (because this would break with the other clients then), we need to create a port forwarding from the Kafka UI’s container port 9092 to the host systems port 9092. The following sketch illustrates the connections:
不幸的是,Kafka 本身会返回一个响应,导致再次重定向到 localhost:9092,这将无法正常工作。如果我们不想配置 Kafka(因为这样会破坏其他客户端),就需要创建一个从 Kafka UI 的容器端口 9092 到主机系统端口 9092 的端口转发。下面的草图说明了连接情况:
We can setup this container-internal port forwarding, e.g. using socat. We have to install it within the container (Alpine Linux), so we need to connect to the container’s bash with root permissions. So we need these commands, beginning within the host system’s command line:
例如,我们可以使用 socat 设置容器内部端口转发。我们必须在容器(Alpine Linux)内安装它,因此需要以 root 权限连接到容器的 bash。因此,我们需要从主机系统的命令行开始执行这些命令:
# Connect to the container's bash (find out the name with 'docker ps')
docker exec -it --user=root <name-of-kafka-ui-container> /bin/sh
# Now, we are connected to the container's bash.
# Let's install 'socat'
apk add socat
# Use socat to create the port forwarding
socat tcp-listen:9092,fork tcp:host.docker.internal:9092
# This will lead to a running process that we don't kill as long as the container's running
Unfortunately, we need to run socat each time we start the container. Another possibility would be to provide an extension to the Dockerfile.
遗憾的是,每次启动容器时,我们都需要运行 socat 。另一种可能性是为 Dockerfile 提供一个扩展。
Now, we can specify localhost:9092 as the bootstrap server within the Kafka UI and should be able to view and create topics, as shown below:
现在,我们可以在 Kafka UI 中指定 localhost:9092 作为引导服务器,然后就可以查看和创建主题了,如下图所示:
3.5. Use Kafka Java Client
3.5. 使用 Kafka Java 客户端
We have to add the following Maven dependency to our project:
我们必须在项目中添加 以下 Maven 依赖项:
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>3.5.1</version>
</dependency>
We can then connect to Kafka and consume the messages we produced before:
然后,我们就可以连接到 Kafka 并使用之前生成的消息:
// specify connection properties
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "MyFirstConsumer");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
// receive messages that were sent before the consumer started
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
// create the consumer using props.
try (final Consumer<Long, String> consumer = new KafkaConsumer<>(props)) {
// subscribe to the topic.
final String topic = "my-first-topic";
consumer.subscribe(Arrays.asList(topic));
// poll messages from the topic and print them to the console
consumer
.poll(Duration.ofMinutes(1))
.forEach(System.out::println);
}
Of course, there is an integration for the Kafka Client in Spring.
当然,Spring 中还集成了 Kafka 客户端。
4. Basic Concept
4.基本概念
4.1. Producers & Consumers
4.1.生产者和消费者
We can differentiate Kafka clients into consumers and producers. Producers send messages to Kafka, while consumers receive messages from Kafka. They only receive messages by actively polling from Kafka. Kafka itself is acting in a passive way. This allows each consumer to have its own performance without blocking Kafka.
我们可以将 Kafka 客户端区分为消费者和生产者。生产者向 Kafka 发送消息,而消费者从 Kafka 接收消息。生产者只通过主动轮询从 Kafka 接收消息。而 Kafka 本身则以被动的方式行事。这使得每个消费者都能拥有自己的性能,而不会阻塞 Kafka。
Of course, there can be multiple producers and multiple consumers at the same time. And, of course, one application can contain both producers and consumers.
当然,可以同时存在多个生产者和多个消费者。当然,一个应用程序也可以同时包含生产者和消费者。
Consumers are part of a Consumer Group that Kafka identifies by a simple name. Only one consumer of a consumer group will receive the message. This allows scaling out consumers with the guarantee of only-once message delivery.
消费者是 Consumer Group(消费者组) 的一部分,Kafka 通过一个简单的名称来识别消费者组。一个消费者组中只有一个消费者会收到消息。这允许在保证消息只传递一次的情况下扩展消费者。
The following picture shows multiple producers and consumers working together with Kafka:
下图显示了多个生产者和消费者与 Kafka 协同工作的情况:
4.2. Messages
4.2 信息
A message (we can also name it “record” or “event“, depending on the use case) is the fundamental unit of data that Kafka processes. Its payload can be of any binary format as well as text formats like plain text, Avro, XML, or JSON.
消息(根据使用情况,我们也可以将其命名为”记录“或”事件“)是 Kafka 处理数据的基本单位。其有效载荷可以是任何二进制格式,也可以是纯文本、Avro、XML 或 JSON 等文本格式。
Each producer has to specify a serializer to transform the message object into the binary payload format. Each consumer has to specify a corresponding deserializer to transform the payload format back to an object within its JVM. We call these components shortly SerDes. There are built-in SerDes, but we can implement custom SerDes too.
每个生产者都必须指定一个序列化器,将消息对象转换成二进制有效载荷格式。每个消费者必须指定一个相应的反序列化器,以便在其 JVM 中将有效载荷格式转换回对象。我们将这些组件简称为 SerDes。有内置的 SerDes,但我们也可以实现自定义的 SerDes。
The following picture shows the payload serialization and deserialization process:
下图显示了有效载荷序列化和反序列化过程:
Additionally, a message can have the following optional attributes:
此外,信息还可以有以下可选属性:
- A key that also can be of any binary format. If we use keys, we also need SerDes. Kafka uses keys for partitioning (we’ll discuss this in more detail in the next chapter).
- A timestamp indicates when the message was produced. Kafka uses timestamps for ordering messages or to implement retention policies.
- We can apply headers to associate metadata with the payload. E.g. Spring adds by default type headers for serialization and deserialization.
4.3. Topics & Partitions
4.3.主题和分区
A topic is a logical channel or category to which producers publish messages. Consumers subscribe to a topic to receive messages from in the context of their consumer group.
主题是生产者发布消息的逻辑通道或类别。消费者订阅主题,以便在其消费者组的上下文中接收消息。
By default, the retention policy of a topic is 7 days, i.e. after 7 days, Kafka deletes the messages automatically, independent of delivering to consumers or not. We can configure this if necessary.
默认情况下,主题的保留策略是 7 天,即 7 天后,Kafka 会自动删除消息,与是否向消费者交付无关。如有必要,我们可以对此进行配置。
Topics consist of partitions (at least one). To be exact, messages are stored in one partition of the topic. Within one partition, messages get an order number (offset). This can ensure that messages are delivered to the consumer in the same order as they were stored in the partition. And, by storing the offsets that a consumer group already received, Kafka guarantees only-once delivery.
主题由分区(至少一个)组成。确切地说,信息存储在主题的一个分区中。在一个分区中,消息会获得一个顺序号(offset)。这可以确保消息以与存储在分区中的顺序相同的顺序交付给消费者。而且,通过存储消费者组已经收到的偏移量,Kafka 可以保证只交付一次。
By dealing with multiple partitions, we can determine that Kafka can provide both ordering guarantees and load balancing over a pool of consumer processes.
通过处理多个分区,我们可以确定,Kafka 既能提供排序保证,又能在消费者进程池上实现负载平衡。
One consumer will be assigned to one partition when it subscribes to the topic, e.g. with the Java Kafka client API, as we have already seen:
一个消费者在订阅主题时将被分配到一个分区,例如我们已经看到的 Java Kafka 客户端 API:
String topic = "my-first-topic";
consumer.subscribe(Arrays.asList(topic));
However, for a consumer, it is possible to choose the partition(s) it wants to poll messages from:
不过,对于消费者来说,可以选择要从哪个分区轮询信息:
TopicPartition myPartition = new TopicPartition(topic, 1);
consumer.assign(Arrays.asList(myPartition));
The disadvantage of this variant is that all group consumers have to use this, so automatically assigning partitions to group consumers won’t work in combination with single consumers that connect to a special partition. Also, rebalancing is not possible in case of architectural changes like adding further consumers to the group.
这种变体的缺点是所有群组消费者都必须使用,因此自动为群组消费者分配分区将无法与连接到特殊分区的单个消费者结合使用。此外,在结构发生变化(如向组中添加更多消费者)时,也无法重新平衡。
Ideally, we have as many consumers as partitions, so that every consumer can be assigned to exactly one of the partitions, as shown below:
实际上,我们有和分区一样多的消费者,因此每个消费者都可以被分配到其中一个分区,如下图所示:
If we have more consumers than partitions, those consumers won’t receive messages from any partition:
如果我们的消费者数量多于分区数量,那么这些消费者将不会收到来自任何分区的信息:
If we have fewer consumers than partitions, consumers will receive messages from multiple partitions, which conflicts with optimal load balancing:
如果我们的消费者数量少于分区数量,消费者就会收到来自多个分区的信息,这就与最佳负载平衡产生了冲突:
Producers do not necessarily send messages to only one partition. Every produced message is assigned to one partition automatically, following these rules:
生产者不一定只向一个分区发送报文。每个生产的报文都会根据这些规则自动分配到一个分区:
- Producers can specify a partition as part of the message. If done so, this has the highest priority
- If the message has a key, partitioning is done by calculating the hash of the key. Keys with the same hash will be stored in the same partition. Ideally, we have at least as many hashes as partitions
- Otherwise, the Sticky Partitioner distributes the messages to partitions
Again, storing messages to the same partition will retain the message ordering, while storing messages to different partitions will lead to disordering but parallel processing.
同样,将报文存储到同一分区将保留报文排序,而将报文存储到不同分区将导致无序但并行的处理。
If the default partitioning does not match our expectations, we can simply implement a custom partitioner. Therefore, we implement the Partitioner interface and register it during the initialization of the producer:
如果默认分区不符合我们的期望,我们可以简单地实现一个自定义分区器。因此,我们要实现 Partitioner 接口,并在生产者初始化过程中注册它:
Properties producerProperties = new Properties();
// ...
producerProperties.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, MyCustomPartitioner.class.getName());
KafkaProducer<String, String> producer = new KafkaProducer<>(producerProperties);
The following picture shows producers and consumers and their connections to the partitions:
下图显示了生产者和消费者及其与分区的连接:
Each producer has its own partitioner, so if we want to ensure that messages are partitioned consistently within the topic, we have to ensure that the partitioners of all producers work the same way, or we should only work with a single producer.
每个生产者都有自己的分区器,因此,如果我们要确保主题内的消息分区一致,就必须确保所有生产者的分区器都以同样的方式工作,否则我们就只能使用单个生产者。
Partitions store messages in the order they arrive at the Kafka broker. Typically, a producer does not send each message as a single request, but it will send multiple messages within a batch. If we need to ensure the order of the messages and only-once delivery within one partition, we need transaction-aware producers and consumers.
分区按照消息到达 Kafka 代理的顺序存储消息。通常情况下,生产者不会以单个请求的形式发送每条消息,而是会在一个批次中发送多条消息。如果我们需要在一个分区内确保消息的顺序和只发送一次,我们就需要事务感知生产者和消费者。
4.4. Clusters and Partition Replicas
4.4.群集和分区副本
As we have found out, Kafka uses topic partitions to allow parallel message delivery and load balancing of consumers. But Kafka itself must be scalable and fault-tolerant. So we typically do not use a single Kafka Broker, but a Cluster of multiple brokers. These brokers do not behave completely the same, but each of them is assigned special tasks that the rest of the cluster can then absorb if one broker fails.
正如我们所发现的,Kafka 使用主题分区来实现并行消息传递和消费者的负载平衡。但是,Kafka 本身必须具有可扩展性和容错性。因此,我们通常不使用单个 Kafka 代理,而是使用由多个代理组成的集群。这些代理的行为并不完全相同,但每个代理都被分配了特殊任务,如果其中一个代理出现故障,集群中的其他代理可以承担这些任务。
To understand this, we need to expand our understanding of topics. When creating a topic, we not only specify the number of partitions but also the number of brokers that jointly manage the partitions using synchronization. We call this the Replication Factor. For example, using the Kafka CLI, we could create a topic with 6 partitions, each of them synchronized on 3 brokers:
要理解这一点,我们需要扩展对主题的理解。创建主题时,我们不仅要指定分区的数量,还要指定使用同步联合管理分区的代理的数量。我们称之为 复制因子。例如,使用 Kafka CLI,我们可以创建一个有 6 个分区的主题,每个分区在 3 个代理上同步:
sh kafka-topics.sh --bootstrap-server localhost:9092 --create --topic my-replicated-topic --partitions 6 --replication-factor 3
For example, a replication factor of three means, that the cluster is resilient for up to two replica failures (N-1 resiliency). We have to ensure that we have at least as many brokers as we specify as the replication factor. Otherwise, Kafka does not create the topic until the count of brokers increases.
例如,复制因子为 3 意味着集群最多可承受两个副本故障(N-1 弹性)。我们必须确保至少有与我们指定的复制因子相同数量的代理。否则,Kafka 在代理数量增加之前不会创建主题。
For better efficiency, replication of a partition only occurs in one direction. Kafka achieves this by declaring one of the brokers as the Partition Leader. Producers only send messages to the partition leader, and the leader then synchronizes with the other brokers. Consumers will also poll from the partition leader because the increasing consumer group’s offset has to be synchronized too.
为了提高效率,分区的复制只在一个方向上进行。Kafka 通过将其中一个代理声明为分区领导者来实现这一点。生产者只向分区领导者发送消息,然后由领导者与其他代理同步。消费者也会从分区领导者处轮询,因为不断增加的消费者组的偏移量也必须同步。
Partition leading is distributed to multiple brokers. Kafka tries to find different brokers for different partitions. Let’s see an example with four brokers and two partitions with a replication factor of three:
分区主导分配给多个代理。Kafka 会尝试为不同的分区寻找不同的代理。让我们来看一个例子,其中有四个代理和两个复制因子为三的分区:
Broker 1 is the leader of Partition 1, and Broker 4 is the leader of Partition 2. So each client will connect to those brokers when sending or polling messages from these partitions. To get information about the partition leaders and other available brokers (metadata), there is a special bootstrapping mechanism. In summary, we can say that every broker can provide the cluster’s metadata, so the client could initialize the connection with each of these brokers, and will redirect to the partition leaders then. That’s why we can specify multiple brokers as bootstrapping servers.
代理 1 是分区 1 的领导者,代理 4 是分区 2 的领导者。因此,当从这些分区发送或轮询信息时,每个客户端都将连接到这些代理。要获取有关分区领导者和其他可用代理的信息(元数据),需要一种特殊的引导机制。总之,我们可以说每个代理都能提供群集的元数据,因此客户端可以初始化与每个代理的连接,然后重定向到分区领导者。这就是为什么我们可以指定多个代理作为引导服务器。
If one partition-leading broker fails, Kafka will declare one of the still-working brokers as the new partition leader. Then, all clients have to connect to the new leader. In our example, if Broker 1 fails, Broker 2 becomes the new leader of Partition 1. Then, the clients that were connected to Broker 1 have to switch to Broker 2.
如果一个分区领导的代理出现故障,Kafka 会宣布其中一个仍在工作的代理为新的分区领导。然后,所有客户端都必须连接到新的领导者。在我们的例子中,如果代理 1 出现故障,代理 2 就会成为分区 1 的新领导者。然后,连接到 Broker 1 的客户端必须切换到 Broker 2。
Kafka uses Kraft (in earlier versions: Zookeeper) for the orchestration of all brokers within the cluster.
Kafka 使用 Kraft(早期版本:Zookeeper) 来协调集群内的所有代理。
4.4. Putting All Together
4.4.综合
If we put producers and consumers together with a cluster of three brokers that manage a single topic with three partitions and a replication factor of 3, we’ll get this architecture:
如果我们把生产者和消费者放在一起,由三个代理集群管理一个主题,三个分区,复制因子为 3,我们就会得到这样的架构:
5. Ecosystem
5.生态系统
We already know that multiple clients like a CLI, a Java-based client with integration to Spring applications, and multiple GUI tools are available to connect with Kafka. Of course, there are further Client APIs for other programming languages (e.g., C/C++, Python, or Javascript), but those are not part of the Kafka project.
我们已经知道,有多种客户端(如 CLI、与 Spring 应用程序集成的基于 Java 的客户端以及多种 GUI 工具)可用于连接 Kafka。当然,还有其他编程语言的客户端 API(例如,C/C++、Python或JavaScript),但这些都不是 Kafka 项目的一部分。
Built on top of these APIs, there are further APIs for special purposes.
在这些应用程序接口的基础上,还有更多用于特殊目的的应用程序接口。
5.1. Kafka Connect API
5.1 Kafka Connect API
Kafka Connect is an API for exchanging data with third-party systems. There are existing connectors e.g. for AWS S3, JDBC, or even for exchanging data between different Kafka clusters. And of course, we can write custom connectors too.
Kafka Connect 是与第三方系统交换数据的 API。有现有的连接器,例如用于 AWS S3、JDBC 或甚至用于在不同 Kafka 集群之间交换数据的连接器。当然,我们也可以编写自定义连接器。
5.2. Kafka Streams API
5.2.卡夫卡流 API
Kafka Streams is an API for implementing stream processing applications that get their input from a Kafka topic, and store the result in another Kafka topic.
Kafka Streams 是用于实现流处理应用程序的 API,这些应用程序从 Kafka 主题获取输入,并将结果存储到另一个 Kafka 主题中。
5.3. KSQL
5.3 KSQL
KSQL is an SQL-like interface built on top of Kafka Streams. It does not require us to develop Java code, but we can declare SQL-like syntax to define stream processing of messages that are exchanged with Kafka. For this, we use the ksqlDB, which connects to the Kafka cluster. We can access ksqlDB with a CLI or with a Java client application.
KSQL 是建立在 Kafka Streams 之上的类 SQL 接口。它不需要我们开发 Java 代码,但我们可以声明类似 SQL 的语法来定义与 Kafka 交换的消息的流处理。为此,我们使用ksqlDB连接到 Kafka 集群。我们可以使用 CLI 或 Java 客户端应用程序访问 ksqlDB。
5.4. Kafka REST Proxy
5.4 Kafka REST 代理
The Kafka REST proxy provides a RESTful interface to a Kafka cluster. This way, we do not need any Kafka clients and avoid using the native Kafka protocol. It allows web frontends to connect with Kafka and makes it possible to use network components like API gateways or firewalls.
Kafka REST 代理为 Kafka 集群提供了一个 RESTful 接口。这样,我们就不需要任何 Kafka 客户端,并避免使用本地 Kafka 协议。它允许网络前端连接 Kafka,并使使用 API 网关或防火墙等网络组件成为可能。
5.5. Kafka Operators for Kubernetes (Strimzi)
5.5 面向 Kubernetes 的 Kafka 操作员(Strimzi)
Strimzi is an open-source project that provides a way to run Kafka on Kubernetes and OpenShift platforms. It introduces custom Kubernetes resources making it easier to declare and manage Kafka-related resources in a Kubernetes-native way. It follows the Operator Pattern, i.e. operators automate tasks like provisioning, scaling, rolling updates, and monitoring of Kafka clusters.
Strimzi 是一个开源项目,提供了在 Kubernetes 和 OpenShift 平台上运行 Kafka 的方法。它引入了定制的 Kubernetes 资源,使以 Kubernetes 原生方式声明和管理 Kafka 相关资源变得更加容易。它遵循操作员模式(Operator Pattern),即操作员自动执行 Kafka 集群的配置、扩展、滚动更新和监控等任务。
6. Conclusion
6.结论
In this article, we have learned that Kafka is designed for high scalability and fault tolerance. Producers collect messages and send them in batches, topics are divided into partitions to allow parallel message delivery and load balancing of consumers, and replication is done over multiple brokers to ensure fault tolerance.
在本文中,我们了解到 Kafka 是专为高可扩展性和容错而设计的。生产者收集信息并分批发送,主题被划分为多个分区,以便并行发送信息和平衡消费者的负载,并通过多个代理进行复制以确保容错。
As usual, all the code implementations are available over on GitHub.
与往常一样,所有代码的实现都可以访问 GitHub。