Redis Sentinel vs Clustering – Redis Sentinel与集群的比较

最后修改: 2022年 10月 11日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

In this tutorial, we’ll talk about Redis and, more importantly, its two different deployment strategies, Redis Sentinel and Redis Cluster. Then, we’ll discuss the differences between those strategies and their nuances.

在本教程中,我们将谈论Redis,更重要的是,它的两种不同的部署策略,即Redis Sentinel和Redis Cluster。然后,我们将讨论这些策略之间的差异和它们的细微差别。

At the end of it, we hope to understand enough about Redis to judge which deployment strategy better fulfils our needs.

最后,我们希望对Redis有足够的了解,以判断哪种部署策略更能满足我们的需求

2. Introduction to Redis

2.Redis简介

Redis is an open-source in-memory data structure store that can be used as a key-value database, cache, and for many other use cases. It aims to provide high-speed access to data.

Redis是一个开源的内存数据结构存储,可用作键值数据库、缓存和许多其他用例。它的目的是提供对数据的高速访问。

Our goal is to analyze and compare the two different strategies, Redis Sentinel and Redis Cluster.

我们的目标是分析和比较两种不同的策略,Redis Sentinel和Redis Cluster。

Redis Sentinel is a separate process that Redis provides. Its goal is to monitor the Redis instances, offer notification capabilities, master discovery, automatic failover in case of failure, and master election via majority voting. In other words, Sentinel is a distributed system that provides extra capabilities to Redis, such as high availability and failover. Redis Sentinel combines forces with the standard Redis deployment.

Redis Sentinel是Redis提供的一个独立进程。它的目标是监控 Redis 实例,提供通知功能、主站发现、故障情况下的自动故障转移,以及通过多数投票选举主站。换句话说,Sentinel是一个分布式系统,它为Redis提供了额外的功能,如高可用性和故障转移。Redis Sentinel与标准Redis部署相结合。

Redis Cluster is a deployment strategy that scales even further. Similar to Sentinel, it provides failover, configuration management, etc. The difference is the sharding capabilities, which allow us to scale out capacity almost linearly up to 1000 nodes.

Redis Cluster是一种可进一步扩展的部署策略。与Sentinel类似,它提供故障转移、配置管理等。不同之处在于分片功能,它使我们能够将容量几乎线性地扩展到1000个节点。

3. Basic Concepts

3.基本概念

To help us understand the nuances of both strategies, let’s first try to internalize some of the basic concepts and building blocks.

为了帮助我们理解这两种策略的细微差别,让我们首先尝试内化一些基本概念和构建模块。

Some concepts may not be strictly related to Redis Cluster or Sentinel. However, others may apply to both, but knowing them will help us understand Redis.

有些概念可能与Redis Cluster或Sentinel没有严格关系。然而,其他的可能适用于两者,但了解它们将有助于我们理解Redis。

3.1. Databases

3.1.数据库

Redis has support for multiple logical databases. Although still persisted in the same file, they allow the user to have the same key with different values in each database. They’re like different database schemas.

Redis支持多个逻辑数据库。虽然仍然在同一个文件中持久化,但它们允许用户在每个数据库中拥有不同值的相同键。它们就像不同的数据库模式。

By default, Redis provides 16 logical databases, but the user can change this number. Such databases are identified by their index, starting from zero.

默认情况下,Redis提供16个逻辑数据库,但用户可以改变这个数字。此类数据库由其索引来识别,从零开始。

Another essential thing to reinforce is that Redis is a single-threaded data store. Therefore, all database operations go to the same pipeline of execution.

另一个需要强调的重要事项是,Redis是一个单线程的数据存储。因此,所有的数据库操作都进入了同一个执行管道。

3.2. Hash Slots

3.2.哈希槽位

Redis Cluster works a bit differently from the standard one. For example, to automatically shard the data and distribute it across the various nodes of the cluster, Redis Cluster uses the so-called hash slots.

Redis Cluster的工作方式与标准的有点不同。例如,为了自动分片并在集群的各个节点上分配数据,Redis Cluster使用了所谓的哈希槽

Redis Cluster doesn’t use consistent hashing to do the distribution job. Instead, it uses the hash slots.

Redis Cluster不使用一致的散列来做分配工作。相反,它使用散列槽。

Each cluster has 16384 hash slots that can be distributed across all the nodes, and during all operations, the Redis client uses the key to calculate the hash by taking the CRC16 of the key modulo 16384. Then, it uses it to route the command to the correct node.

每个集群有16384个哈希槽,可以分布在所有节点上在所有操作过程中,Redis客户端使用密钥来计算哈希,方法是将密钥的CRC16调制到16384。然后,它使用它来将命令路由到正确的节点。

Each node has a subset of hash slots assigned to it, and it’s possible to move slots between nodes using resharding or rebalance operations. Also, given the particularities of this approach, the Redis Cluster doesn’t allow multiple logical databases per node. Therefore, only the database zero is available in each node.

每个节点都有一个分配给它的哈希槽的子集,并且可以使用重新分片或重新平衡操作在节点之间移动槽。另外,鉴于这种方法的特殊性,Redis Cluster不允许每个节点有多个逻辑数据库。因此,每个节点中只有数据库0可用

Last, due to the distribution of the keys in hash slots, Redis Cluster has a caveat when it comes to multiple key operations. Those operations are still available, although all the keys involved have to belong to the same hash slot. Otherwise, Redis rejects the request.

最后,由于键在哈希槽中的分布,Redis Cluster在涉及多键操作时有一个注意事项。这些操作仍然可用,尽管所有涉及的键必须属于同一个哈希槽。否则,Redis将拒绝该请求。

3.3. Hash Tags

3.3.哈希标签

This mechanism helps users guarantee a group of keys go to the same hash slot. To define a hash tag, the user has to add a substring between brackets in a key. For instance, the keys app1{user:123}.mykey1 and app1{user:123}.mykey2 would go to the same hash lot.

这种机制可以帮助用户保证一组钥匙进入同一个哈希槽。为了定义一个哈希标签,用户必须在一个钥匙中的括号之间添加一个子字符串。例如,键app1{user:123}.mykey1和app1{user:123}.mykey2将进入同一个哈希槽。

By doing this, Redis only uses the substring to generate the hash of the key and, consequently, for routing all keys to the same hash slot.

通过这样做,Redis只使用子串来生成密钥的哈希值,从而将所有的密钥路由到同一个哈希槽。

3.4. Asynchronous Replication

3.4.异步复制

Both Redis Cluster and the standard one use asynchronous replication. That means Redis doesn’t wait for replicas to acknowledge writes.

Redis Cluster和标准的都使用异步复制。这意味着Redis不会等待复制体确认写入。

Moreover, there will always be a tiny window of time where a failure may cause an acknowledged write to be lost. However, there are ways to mitigate this risk by limiting this window of time as much as possible. But once again, the risk is always there, so Redis doesn’t guarantee strong consistency.

此外,总会有一个微小的时间窗口,故障可能导致确认的写入丢失。然而,有一些方法可以通过尽可能地限制这个时间窗口来减轻这种风险。但再次强调,风险总是存在的,所以Redis并不保证强一致性

3.5. Failover

3.5.故障转移

Redis offers different mechanisms to deal with failures and guarantee some level of fault tolerance. Both Redis Cluster and the standard one using Sentinel have such tools. Such systems come with a failure detector based on health checks and timeouts.

Redis提供了不同的机制来处理故障并保证一定程度的容错。Redis Cluster和使用Sentinel的标准的都有这样的工具。这样的系统带有一个基于健康检查和超时的故障检测器。

In the case of Redis Cluster, it uses heartbeat and gossip protocol. For the purpose of this article, we’ll not go deeper into those. But to sum up, each one talks to the other nodes, exchanges packets with metadata, and calculates some timeouts in case a particular node doesn’t respond.

就Redis Cluster而言,它使用心跳和八卦协议。为了本文的目的,我们不会深入研究这些。但总的来说,每个节点都会与其他节点对话,交换带有元数据的数据包,并计算一些超时,以防某个节点没有回应。

Regarding Sentinel, each Sentinel instance monitors a Redis instance. The Sentinel instances also communicate between themselves and, based on the configuration, may execute failovers due to problems with communication and timeouts.

关于Sentinel,每个Sentinel实例监控一个Redis实例。Sentinel 实例之间也进行通信,并且基于配置,可能会因通信和超时问题而执行故障转移。

3.6. Master Election

3.6.主选举

Again, both implementations have strategies for voting, and the process has different phases. But the main goal’s to decide if the failover should happen and which to promote.

同样,两种实现方式都有投票的策略,而且这个过程有不同的阶段。但主要的目标是决定是否应该发生故障转移,以及促进哪一个。

Sentinel works with the concept of quorum. The quorum is the minimum number of Sentinel instances that need to find the consensus about whether a master node is reachable or not.

Sentinel采用法定人数的概念工作。法定人数是指需要找到关于主节点是否可达的共识的最小数量的Sentinel实例。

Once this happens, the master is marked as failing, and eventually, a failover process will start if possible. After that, at least a majority of Sentinels must authorize the failover and elect the Sentinel instance responsible for the failover. Then, this instance selects the best read replica to promote and executes the failover. Finally, the instance starts broadcasting the new setup to the other Sentinel instances.

一旦发生这种情况,主站就被标记为失败,最终,如果可能的话,将启动故障转移过程。之后,至少有大多数哨兵必须授权故障转移,并选出负责故障转移的哨兵实例。然后,这个实例选择最佳的读取副本进行推广并执行故障转移。最后,该实例开始向其他哨兵实例广播新的设置。

The process changes a bit when it comes to Redis Cluster. This time, the replica nodes are in charge. As we mentioned before, in this case, all nodes communicate between them, so once one or more replicas detect the failure, they can start the election. The next step is to request the vote of the masters. Once the voting happens, a replica receives the failover rights.

当涉及到Redis Cluster时,这个过程发生了一些变化。这一次,由复制节点负责。正如我们之前提到的,在这种情况下,所有节点之间都会进行通信,所以一旦一个或多个副本检测到故障,他们就可以开始选举。下一步是请求主人的投票。一旦投票发生,一个副本就会收到故障转移的权利。

Given the nature of those vote mechanisms, Redis recommends always using an odd number of nodes in the cluster, which applies to both implementations.

鉴于这些投票机制的性质,Redis建议在集群中始终使用奇数的节点,这适用于两种实现。

3.7. Network Partition

3.7.网络分区

Redis can survive many different failures, and its design is robust enough to provide continuous operation even when nodes go down. However, one of the critical problem scenarios we can face is the so-called network partition. Due to its topology, Redis faces challenging situations when dealing with such problems.

Redis可以在许多不同的故障中幸存下来,它的设计足够强大,即使在节点宕机时也能提供连续的操作。然而,我们可能面临的关键问题场景之一是所谓的网络分区。由于其拓扑结构,Redis在处理这类问题时面临挑战。

The so-called split brain’s one of the most straightforward problems in this category. Let’s use the following example:

所谓的大脑分裂是这一类中最直接的问题之一.让我们用下面的例子。

Split brain step 1
Here we can observe that we have a network partition, and from one side, a client is communicating with our cluster that now only can reach two instances, 1 and 2.

Split brain step 1
在这里我们可以观察到,我们有一个网络分区,从一边,一个客户端正在与我们的集群进行通信,现在只能到达两个实例,1和2。

On the other side, another client communicates with the other partition where only 3 and 4 are reachable in the partition. Imagining our quorum is 2, at some point, the failover will happen on the right side, and we’ll reach the outcome:
Split brain step 2

在另一边,另一个客户端与另一个分区进行通信,在这个分区中只有3和4是可以到达的。想象一下我们的法定人数是2,在某些时候,故障转移将在右侧发生,我们将达到这样的结果:
Split brain step 2

Now, we’ve two clusters (Redis 3 has been promoted to master). At some point, when the network partition goes away, the cluster may have a problem in case the same key was written on both sides with different values during the separation.

现在,我们有两个集群(Redis 3已经被提升为主集群)。在某些时候,当网络分区消失后,集群可能会出现问题,如果在分离过程中,同一键被写入两边的值不同。

We can apply the same principles to a Redis Cluster, and the outcome would be similar. This illustrates why odd numbers are recommended and configurations such as min-replicas-to-write are used. The idea is always to have majority and minority partitions where only the majority would be able to perform the failover.

我们可以将同样的原则应用于Redis集群,结果将是类似的。这说明了为什么建议使用奇数,并使用min-replicas-to-write等配置我们的想法是始终拥有多数和少数分区,只有多数才能执行故障转移。

As we may imagine, there are many other possible scenarios during network partitions. For that reason, it’s essential to understand and have it in mind when designing our cluster, no matter which option we choose.

正如我们可以想象的那样,在网络分区期间还有许多其他可能的情况。出于这个原因,在设计我们的集群时,无论我们选择哪种方案,都必须了解并牢记这一点。

4. Redis Sentinel vs Clustering

4.Redis Sentinel vs Clustering

The table below compares some of their main features:

下表比较了它们的一些主要特征。

Comparing Redis standard with Sentinel vs Cluster

We can connect the dots by using all the details and nuances of Redis Cluster and the standard implementation combined with Redis Sentinel.

我们可以通过使用Redis Cluster的所有细节和细微差别以及与Redis Sentinel相结合的标准实现来连接这些点。

We can conclude that Sentinel provides monitoring, fault tolerance, and notifications. It’s also a configuration provider. Moreover, it can provide authentication capabilities and client service discovery. It can provide 16 logical databases, async replication and high availability for your cluster.

我们可以得出结论,Sentinel提供监控、容错和通知。它也是一个配置提供者。此外,它可以提供认证功能和客户端服务发现。它可以为你的集群提供16个逻辑数据库、异步复制和高可用性。

Nonetheless, the single-threaded nature of Redis and the barriers of vertical scaling are limiting factors to Sentinel’s scaling capabilities. However, for small and medium projects, it may be ideal.

尽管如此,Redis的单线程特性和垂直扩展的障碍是限制Sentinel扩展能力的因素。然而,对于中小型项目来说,它可能是理想的选择。

It’s also true that deployment with Sentinel requires fewer nodes. Therefore, it’s more cost-effective.

这也是事实,使用Sentinel进行部署需要更少的节点。因此,它的成本效益更高。

A point worth mentioning is that we can also scale read-only operations further by using read replicas.

值得一提的是,我们还可以通过使用读复制来进一步扩展只读操作。

Redis Cluster is a fully distributed implementation with automated sharding capabilities (horizontal scaling capabilities), designed for high performance and linear scaling up to 1000 nodes. In addition, it provides a reasonable degree of writing safety and means to survive disasters as long as all the data of masters are reachable from the majority side (by the master or replicas cable to do the failover).

Redis Cluster是一个完全分布式的实现,具有自动分片功能(水平扩展能力),旨在实现高性能和线性扩展,最高可达1000个节点。此外,只要主站的所有数据都能从多数站(由主站或复制站电缆进行故障转移)到达,它就能提供合理程度的写入安全性和在灾难中生存的手段。

Another point is that during a network partition, the minority part of the cluster will stop accepting requests. On the other hand, a Sentinel deployment may continue working partially, depending on the configuration. As mentioned previously, the availability of the cluster and the fault tolerance depends on the cluster configuration and composition.

另一点是,在网络分区期间,集群的少数部分将停止接受请求。另一方面,Sentinel部署可能会继续部分工作,这取决于配置。如前所述,集群的可用性和容错性取决于集群的配置和组成。

The Cluster version also offers the ability to reconfigure the mapping between masters and replicas to rebalance in case of multiple independent failures of a single node occur. However, once again, more nodes are necessary for a more robust cluster and, therefore, more cost. Besides, managing more nodes and balancing shards is extra complexity that can’t be overlooked. 

集群版本还提供了重新配置主站和复制站之间的映射的能力,以便在单个节点发生多个独立故障的情况下重新平衡。然而,再一次,更多的节点对于一个更强大的集群是必要的,因此,成本也更高。此外,管理更多的节点和平衡分片是不可忽视的额外复杂性。

To sum up, Redis Cluster shines when it comes to large deployment with big data sets that require high throughput and scaling capabilities.

总之,当涉及到需要高吞吐量和扩展能力的大数据集的大型部署时,Redis Cluster大放异彩。

5. Conclusion

5.总结

In this article, we discussed Redis and its different implementations, Redis Cluster and Redis standard using Sentinel.

在这篇文章中,我们讨论了Redis及其不同的实现方式,Redis集群和使用Sentinel的Redis标准。

We also understood all the basic building blocks of the solution and how to use them to extract the most out of it. We hope to have the tools to decide which option to use based on the use case.

我们还了解了该解决方案的所有基本构件,以及如何使用它们来提取最大的利益。我们希望能有工具来决定根据使用情况使用哪种方案。