Redis Sentinel概述
我们知道Redis类似MySQL数据库自带主从复制结构,产品环境中,如果一旦master发生crash,我们希望slave可以立即自动提升为主,接替业务提供服务,如何实现这个功能呢?redis sentinel集群可以帮助我们实现这个功能;
Redis Sentinel是Redis官方原生高可用解决方案,Redis Sentinel部署架构主要包括两部分:Redis Sentinel集群和Redis master-slave集群,其中Redis Sentinel集群是由若干Sentinel节点组成的分布式集群;
可以实现故障发现、故障转移、配置中心和客户端通知。Redis Sentinel的节点数量要满足2n+1(n>=1)的奇数个(官方建议至少3个)。
Redis Sentinel特点
(1)master与slave之间的failover是通过sentinel来监控,如果共有5个sentinel,配置参数中设置只要有2个sentinel认为master crash了,就会进行failover,但是进行failover的那个sentinel必须先获得至少3个sentinel的授权才能实行failover;
(2)sentinel集群不会同一时间多个sentinel并发执行failover,如果第一个进行failover的sentinel失败了,另外一个将会在一定时间内重新进行failover,以此类推;
(3)当failover后,sentinel会获得master的最新的一个配置版本号,然后在广播给其他sentinel,所以一个能够互相通信的sentinel集群最终会采用版本号最高且相同的配置;
(4)Redis Sentinel version1开始于Redis2.6, Redis Sentinel version 2 开始于Redis 2.8,建议使用Sentinel 2 ;
(5)Redis-Sentinel是Redis官方推荐的高可用性(HA) 解决方案,Redis-sentinel本身也是一个独立运行的进程,它能监控多个master-slave集群,发现master宕机后能进行自动切换。Sentinel可以监视任意多个主服务器(复用),以及主服务器属下的从服务器,并在被监视的主服务器下线时,自动执行故障转移操作。
SDOWN和ODOWN
SDOWN(主观宕机)是sentinel自己主观检测到master的状态是down;
ODOWN(客观宕机)需要大多数的sentinel都认为master宕机;
从SDOWN切换到ODOWN不需要任何一致性算法,只需要一个gossip协议,如果一个sentinel收到了足够多的sentinel发来消息告诉它某个master已经down掉了,SDOWN状态就会变成ODOWN状态。如果之后master可用了,这个状态就会相应地被清理掉。
Sentinel.conf相关参数
port 26379
#sentinel的端口号
sentinel monitor mymaster 127.0.0.1 6379 2
#sentinel监控的master名称默认是mymaster, 最后数字2表示如果有两个sentinel认为master挂了,则这个master即认为不可用;
注意:我们可以通过配置不同的master名称,让一套Sentinel Cluster监控多个Redis master-slave集群;
sentinel down-after-milliseconds mymaster 30000
# 默认30秒,sentinel会通过ping来判断master是否存活,如果在30秒内master返回pong给sentinel,则认为master是好的,否则sentinel认为master不可用;
sentinel parallel-syncs mymaster 1
#当Sentinel节点集合对主节点故障判定达成一致时,Sentinel领导者节点会做故障转移操作,选出新的主节点,原来的从节点会向新的主节点发起复制操作,限制每次向新的主节点发起复制操作的从节点个数为1
sentinel failover-timeout mymaster 180000
# 故障转移超时时间为3min
Redis Sentinel中的身份验证
当一个master配置为需要密码才能连接时,客户端和slave在连接时都需要提供密码;
master通过requirepass设置自身的密码,不提供密码无法连接到这个master;
slave通过masterauth来设置访问master时的密码;
但是当使用了sentinel时,由于一个master可能会变成一个slave,一个slave也可能会变成master,所以需要同时设置上述两个配置项。
安装Redis Sentinel
(1)Redis sentinel架构图和节点环境
Roel | Host | IP | Port |
Sentinel1 | sht-sgmhadoopnn-01 | 172.16.101.54 | 26379 |
Sentinel2 | sht-sgmhadoopnn-01 | 172.16.101.55 | 26379 |
Sentinel3 | sht-sgmhadoopnn-02 | 172.16.101.56 | 26379 |
Master | sht-sgmhadoopdn-01 | 172.16.101.58 | 6379 |
Slave1 | sht-sgmhadoopdn-02 | 172.16.101.59 | 6379 |
Slave2 | sht-sgmhadoopdn-03 | 172.16.101.60 | 6379 |
(2)配置Redis主从复制
[root@sht-sgmhadoopdn-01 redis]# vim redis.conf bind 172.16.101.58 [root@sht-sgmhadoopdn-01 redis]# src/redis-server redis.conf [root@sht-sgmhadoopdn-02 redis]# vim redis.conf bind 172.16.101.59 slaveof 172.16.101.58 6379 [root@sht-sgmhadoopdn-02 redis]# src/redis-server redis.conf [root@sht-sgmhadoopdn-03 redis]# vim redis.conf bind 172.16.101.60 slaveof 172.16.101.58 6379 [root@sht-sgmhadoopdn-03 redis]# src/redis-server redis.conf
检查主从复制的设置
[root@sht-sgmhadoopdn-01 redis]# src/redis-cli -h 172.16.101.58 172.16.101.58:6379> client list id=3 addr=172.16.101.59:35718 fd=7 name= age=26 idle=0 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=replconf id=4 addr=172.16.101.60:33986 fd=8 name= age=22 idle=0 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=replconf id=5 addr=172.16.101.58:38875 fd=9 name= age=4 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=client
172.16.101.58:6379> info replication # Replication role:master connected_slaves:2 slave0:ip=172.16.101.59,port=6379,state=online,offset=57,lag=0 slave1:ip=172.16.101.60,port=6379,state=online,offset=57,lag=0 master_repl_offset:57 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:2 repl_backlog_histlen:56
(3)配置sentinel集群
三个sentinel节点的sentinel.conf文件配置一样,如果是在同一个主机上,则需要使用不同的端口号
[root@sht-sgmhadoopcm-01 redis]# vim sentinel.conf port 26379 daemonize yes protected-mode no logfile "sentinel.log" dir /usr/local/redis sentinel monitor mymaster 172.16.101.58 6379 2 sentinel down-after-milliseconds mymaster 30000 sentinel parallel-syncs mymaster 1
sentinel节点有两种启动方法:
src/redis-sentinel sentinel.conf src/redis-server sentinel.conf --sentinel [root@sht-sgmhadoopcm-01 redis]# src/redis-sentinel sentinel.conf [root@sht-sgmhadoopnn-01 redis]# src/redis-sentinel sentinel.conf [root@sht-sgmhadoopnn-02 redis]# src/redis-sentinel sentinel.conf [root@sht-sgmhadoopcm-01 redis]# ps -ef|grep redis|grep -v grep root 7541 1 0 22:33 ? 00:00:00 src/redis-sentinel *:26379 [sentinel]
(4)检查整个集群的状态
[root@sht-sgmhadoopcm-01 redis]# src/redis-cli -h 172.16.101.54 -p 26379 172.16.101.54:26379> client list id=3 addr=172.16.101.55:43182 fd=13 name=sentinel-ab45fe6c-cmd age=138 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping id=4 addr=172.16.101.56:60016 fd=15 name=sentinel-e32f20c0-cmd age=136 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping id=5 addr=172.16.101.54:35342 fd=17 name= age=26 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=client 172.16.101.54:26379> info sentinel # Sentinel sentinel_masters:1 sentinel_tilt:0 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 sentinel_simulate_failure_flags:0 master0:name=mymaster,status=ok,address=172.16.101.58:6379,slaves=2,sentinels=3 [root@sht-sgmhadoopdn-01 redis]# src/redis-cli -h 172.16.101.58 -p 6379 172.16.101.58:6379> client list id=16 addr=172.16.101.54:56510 fd=10 name=sentinel-30393e76-pubsub age=326 idle=0 flags=N db=0 sub=1 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=subscribe id=17 addr=172.16.101.54:56508 fd=11 name=sentinel-30393e76-cmd age=326 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping id=18 addr=172.16.101.55:57444 fd=12 name=sentinel-ab45fe6c-cmd age=177 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=publish id=19 addr=172.16.101.55:57446 fd=13 name=sentinel-ab45fe6c-pubsub age=177 idle=0 flags=N db=0 sub=1 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=subscribe id=3 addr=172.16.101.59:35718 fd=7 name= age=3936 idle=0 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=replconf id=4 addr=172.16.101.60:33986 fd=8 name= age=3932 idle=0 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=replconf id=20 addr=172.16.101.56:55648 fd=14 name=sentinel-e32f20c0-cmd age=173 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping id=21 addr=172.16.101.56:55650 fd=15 name=sentinel-e32f20c0-pubsub age=173 idle=0 flags=N db=0 sub=1 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=subscribe id=5 addr=172.16.101.58:38875 fd=9 name= age=3914 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=client
当我们启动主从节点和sentinel节点后,sentinel.conf配置文件会自动添加或修改参数
[root@sht-sgmhadoopcm-01 redis]# cat sentinel.conf sentinel myid 30393e76e002cb64db92fb8bcb88d79f2d85a82b sentinel config-epoch mymaster 0 sentinel leader-epoch mymaster 0 # Generated by CONFIG REWRITE sentinel known-slave mymaster 172.16.101.60 6379 sentinel known-slave mymaster 172.16.101.59 6379 sentinel known-sentinel mymaster 172.16.101.55 26379 ab45fe6c0f010473ce3b7b4d2120e1a83776b736 sentinel known-sentinel mymaster 172.16.101.56 26379 e32f20c0f315e712c9921371f15729246f3816a0 sentinel current-epoch 0 [root@sht-sgmhadoopnn-01 redis]# cat sentinel.conf sentinel myid ab45fe6c0f010473ce3b7b4d2120e1a83776b736 sentinel config-epoch mymaster 0 sentinel leader-epoch mymaster 0 # Generated by CONFIG REWRITE sentinel known-slave mymaster 172.16.101.60 6379 sentinel known-slave mymaster 172.16.101.59 6379 sentinel known-sentinel mymaster 172.16.101.56 26379 e32f20c0f315e712c9921371f15729246f3816a0 sentinel known-sentinel mymaster 172.16.101.54 26379 30393e76e002cb64db92fb8bcb88d79f2d85a82b sentinel current-epoch 0 [root@sht-sgmhadoopnn-02 redis]# cat sentinel.conf sentinel myid e32f20c0f315e712c9921371f15729246f3816a0 sentinel config-epoch mymaster 0 sentinel leader-epoch mymaster 0 # Generated by CONFIG REWRITE sentinel known-slave mymaster 172.16.101.60 6379 sentinel known-slave mymaster 172.16.101.59 6379 sentinel known-sentinel mymaster 172.16.101.54 26379 30393e76e002cb64db92fb8bcb88d79f2d85a82b sentinel known-sentinel mymaster 172.16.101.55 26379 ab45fe6c0f010473ce3b7b4d2120e1a83776b736 sentinel current-epoch 0
测试自动failover
[root@sht-sgmhadoopdn-01 redis]# ps -ef|grep redis root 15128 1 0 21:17 ? 00:00:05 src/redis-server 172.16.101.58:6379 [root@sht-sgmhadoopdn-01 redis]# kill -9 15128 [root@sht-sgmhadoopcm-01 redis]# tail -f sentinel.log 7541:X 05 Aug 22:55:48.052 # +sdown master mymaster 172.16.101.58 6379 #sentinel主观认为master crash; 7541:X 05 Aug 22:55:48.143 # +odown master mymaster 172.16.101.58 6379 #quorum 2/2 #只要有两个sentinel节点认为master crash,则客观认为master crash 7541:X 05 Aug 22:55:48.143 # +new-epoch 1 7541:X 05 Aug 22:55:48.143 # +try-failover master mymaster 172.16.101.58 6379 7541:X 05 Aug 22:55:48.165 # +vote-for-leader 30393e76e002cb64db92fb8bcb88d79f2d85a82b 1 7541:X 05 Aug 22:55:48.166 # ab45fe6c0f010473ce3b7b4d2120e1a83776b736 voted for ab45fe6c0f010473ce3b7b4d2120e1a83776b736 1 7541:X 05 Aug 22:55:48.173 # e32f20c0f315e712c9921371f15729246f3816a0 voted for ab45fe6c0f010473ce3b7b4d2120e1a83776b736 1 7541:X 05 Aug 22:55:48.544 # +config-update-from sentinel ab45fe6c0f010473ce3b7b4d2120e1a83776b736 172.16.101.55 26379 @ mymaster 172.16.101.58 6379 7541:X 05 Aug 22:55:48.544 # +switch-master mymaster 172.16.101.58 6379 172.16.101.60 6379 7541:X 05 Aug 22:55:48.545 * +slave slave 172.16.101.59:6379 172.16.101.59 6379 @ mymaster 172.16.101.60 6379 7541:X 05 Aug 22:55:48.545 * +slave slave 172.16.101.58:6379 172.16.101.58 6379 @ mymaster 172.16.101.60 6379 #从这一步到下一步执行failover成功之间需要等待30s,这是由于参数sentinel down-after-milliseconds mymaster控制,master 30s之内没有响应sentinel才会真正的failover; 7541:X 05 Aug 22:56:18.562 # +sdown slave 172.16.101.58:6379 172.16.101.58 6379 @ mymaster 172.16.101.60 6379
master和slave发生了变化,IP60成为新的master,IP58成为slave
[root@sht-sgmhadoopcm-01 redis]# src/redis-cli -h 172.16.101.54 -p 26379 172.16.101.54:26379> sentinel masters 1) 1) "name" 2) "mymaster" 3) "ip" 4) "172.16.101.60" ...... 172.16.101.54:26379> sentinel slaves mymaster 1) 1) "name" 2) "172.16.101.58:6379" 3) "ip" 4) "172.16.101.58" 9) "flags" 10) "s_down,slave,disconnected" 2) 1) "name" 2) "172.16.101.59:6379" 3) "ip" 4) "172.16.101.59" 9) "flags" 10) "slave"
重启修复好的旧master之后,会自动成为新master的从库
[root@sht-sgmhadoopdn-01 redis]# src/redis-server redis.conf [root@sht-sgmhadoopcm-01 redis]# tail -f sentinel.log 7541:X 05 Aug 23:11:10.556 # -sdown slave 172.16.101.58:6379 172.16.101.58 6379 @ mymaster 172.16.101.60 6379 7541:X 05 Aug 23:11:20.518 * +convert-to-slave slave 172.16.101.58:6379 172.16.101.58 6379 @ mymaster 172.16.101.60 6379
总结:
Failover过程分析:
Each Sentinel detects the master is down with an +sdown event.
This event is later escalated to +odown, which means that multiple Sentinels agree about the fact the master is not reachable.
Sentinels vote a Sentinel that will start the first failover attempt.
The failover happens.
sentinel节点会定期通过ping检测redis的master是否存活,一旦master crash,
首先sentinel自己会主观认为master crash,然后三个sentinel之间彼此通信,只要有两个sentinel节点认为master crash,则客观认为master crash,
接着三个sentinel节点会投票,得到两票的一个sentinel会去执行failover,
最后master 30s之内没有响应sentinel才会真正的failover;
一旦挂掉的旧master修复,重新启动后,会作为新master的从库存在;
FAQ
Error1:
[root@sht-sgmhadoopcm-01 redis]# src/redis-cli -h 172.16.101.54 -p 26379 172.16.101.54:26379> ping (error) DENIED Redis is running in protected mode because protected mode is enabled, no bind address was specified, no authentication password is requested to clients. In this mode connections are only accepted from the loopback interface. If you want to connect from external computers to Redis you may adopt one of the following solutions: 1) Just disable protected mode sending the command 'CONFIG SET protected-mode no' from the loopback interface by connecting to Redis from the same host the server is running, however MAKE SURE Redis is not publicly accessible from internet if you do so. Use CONFIG REWRITE to make this change permanent. 2) Alternatively you can just disable the protected mode by editing the Redis configuration file, and setting the protected mode option to 'no', and then restarting the server. 3) If you started the server manually just for testing, restart it with the '--protected-mode no' option. 4) Setup a bind address or an authentication password. NOTE: You only need to do one of the above things in order for the server to start accepting connections from the outside. 解决方法: [root@sht-sgmhadoopcm-01 redis]# vim sentinel.conf protected-mode no
参考链接:Redis Sentinel Documentation https://redis.io/topics/sentinel