With failover time in single digit seconds
Most NoSQL deployments use three replicas to ensure high availability (HA). From a high-level perspective, the first replica is usually used to store your dataset, the second for failover purposes, and the third to serve as a tiebreaker in case of a network split event. Because DRAM is expensive, maintaining three replicas can be extremely expensive. Redis Enterprise, on the other hand, allows you to have a fully HA system with only two replicas, where your tiebreaker is determined at the node level by using an uneven number of nodes in a cluster. The example below compares the infrastructure cost of running a 90GB HA OSS Redis dataset on Amazon Web Services with three replicas as opposed to with an Redis Enterprise cluster that uses two replicas and a quorum node:
Redis Enterprise replication is based on diskless replication at both the master and slave, as shown in the figure below:
In addition, Redis Enterprise uses PSYNC2 for its core operations, so the active replication link is maintained afterwards for planned failover or shard migration operations.
A Redis Enterprise cluster uses two watchdog processes to detect failures:
These watchdog processes are part of the distributed cluster manager entity and reside on each node of the cluster. It is extremely important for failure detection to be managed by entities that run inside the cluster in order to avoid situations like that shown on the left side of the figure below. In this example, the watchdog entity is located in the wrong side of the network split and cannot trigger the failover process:
Once a failure event is detected, the Redis Enterprise cluster automatically and transparently runs a set of internal distributed processes that failover the relevant shard(s) and endpoint(s) (if needed) to healthy cluster nodes. If necessary, they also reroute user traffic through a different proxy or proxies.
The Redis Enterprise cluster has out-of-the-box HA profiles for noisy (public cloud) and quiet (virtual private cloud, on-premises) environments. We have found that triggering failovers too aggressively can create stability issues. On the other hand, in a quiet network environment, a Redis Enterprise cluster can be easily tuned to support a constant single-digit (<10 sec) failover time in all failure scenarios.
Redis Enterprise supports multi-AZ/rack cluster configurations. In this mode, the cluster nodes are tagged with the zone/rack they have been deployed in, and Redis Enterprise ensures that master and slave Redis processes of the same shard are never hosted on nodes that are located in the same AZ/rack. Running Redis Enterprise in a multi-AZ/rack environment requires the following conditions:
An example of Redis Enterprise multi-AZ configuration in the cloud is shown here:
As you can see, this example meets all the conditions discussed above: