Redis高可用之哨兵(sentinel)实际部署和分析篇
mhr18 2024-10-26 10:54 25 浏览 0 评论
我们以一主两从的复制方式结合案例学习一下哨兵的部署,相关环境如下
redis复制部署
redis复制部署请参考另外一篇文章Redis高可用之复制搭建(REPLICATION)
部署完成后效果如下
- 主节点ec2-redis-01
> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=192.168.100.4,port=6379,state=online,offset=367920,lag=0
slave1:ip=192.168.100.3,port=6379,state=online,offset=367920,lag=0
master_replid:4668490d0bd8e3d2967fe33eb49efde8af6a537b
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:367920
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:10485760
repl_backlog_first_byte_offset:1
repl_backlog_histlen:367920
> role
1) "master"
2) (integer) 368032
3) 1) 1) "192.168.100.4"
2) "6379"
3) "368018"
2) 1) "192.168.100.3"
2) "6379"
3) "368032"
- 从节点ec2-redis-02
> info replication
# Replication
role:slave
master_host:192.168.100.2
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:368298
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:4668490d0bd8e3d2967fe33eb49efde8af6a537b
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:368298
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:10485760
repl_backlog_first_byte_offset:43
repl_backlog_histlen:368256
127.0.0.1:6379> role
1) "slave"
2) "192.168.100.2"
3) (integer) 6379
4) "connected"
5) (integer) 368312
- 从节点ec2-redis-03
> info replication
# Replication
role:slave
master_host:192.168.100.2
master_port:6379
master_link_status:up
master_last_io_seconds_ago:2
master_sync_in_progress:0
slave_repl_offset:368438
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:4668490d0bd8e3d2967fe33eb49efde8af6a537b
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:368438
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:10485760
repl_backlog_first_byte_offset:1
repl_backlog_histlen:368438
127.0.0.1:6379> role
1) "slave"
2) "192.168.100.2"
3) (integer) 6379
4) "connected"
5) (integer) 368452
sentinel哨兵部署
- 创建sentinel主配置文件sentinel.conf,内容如下
protected-mode no
port 26379
daemonize yes
supervised systemd
pidfile "/var/run/redis-sentinel.pid"
logfile "/usr/local/redis/log/sentinel.log"
dir "/usr/local/redis/data"
sentinel deny-scripts-reconfig yes
sentinel monitor userservice 192.168.100.2 6379 2
sentinel down-after-milliseconds userservice 5000
sentinel failover-timeout userservice 30000
- 将sentinel哨兵集成到systemctl管理
# cat /usr/lib/systemd/system/sentinel.service
[Unit]
Description=Redis Sentinel provides high availability for Redis.
Documentation=https://redis.io/topics/sentinel
After=syslog.target
After=network.target
After=redis.target //注意sentinel服务最好在redis服务启动后再启动.
[Service]
Type=notify
User=redis
Group=dba
Restart=always
# Disable OOM kill on the Sentinel
OOMScoreAdjust=-1000
Environment=PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj
Environment=PG_OOM_ADJUST_VALUE=0
ExecStart=/usr/local/redis/src/redis-sentinel /usr/local/redis/conf/sentinel.conf
ExecStop=/usr/local/redis/src/redis-cli -p 26379 shutdown
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
# systemctl daemon-reload
# systemctl enable sentinel
# systemctl start sentinel
# systemctl status sentinel
- sentinel哨兵启动后的几点变化
- sentinel哨兵信息获取
17397:X 04 Jul 2020 21:02:31.744 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
17397:X 04 Jul 2020 21:02:31.744 # Redis version=5.0.9, bits=64, commit=00000000, modified=0, pid=17397, just started
17397:X 04 Jul 2020 21:02:31.744 # Configuration loaded
17397:X 04 Jul 2020 21:02:31.744 * supervised by systemd, will signal readiness
17397:X 04 Jul 2020 21:02:31.746 * Running mode=sentinel, port=26379.
17397:X 04 Jul 2020 21:02:31.752 # Sentinel ID is 85519e3d32442ac8c87329f381079007a08f9b9f
17397:X 04 Jul 2020 21:02:31.753 # +monitor master userservice 192.168.100.2 6379 quorum 2
17397:X 04 Jul 2020 21:02:31.758 * +slave slave 192.168.100.4:6379 192.168.100.4 6379 @ userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:02:31.761 * +slave slave 192.168.100.3:6379 192.168.100.3 6379 @ userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:02:33.786 * +sentinel sentinel 620c49f4f658095dae056e670822b634a257cc23 192.168.100.4 26379 @ userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:02:33.810 * +sentinel sentinel d7ec4699a9744e8b806efba64a12adf74fdf367f 192.168.100.3 26379 @ userservice 192.168.100.2 6379
从每个sentinel哨兵节点的日志可以看到,哨兵根据我们指定的主节点地址通过info命令找到了所有从节点信息,通过发布和订阅redis频道__sentinel__:hello频道找到了其他哨兵节点
- sentinel主配置文件更新
哨兵启动后,在每个哨兵的主配置文件中自动添加了以下动态参数
# Generated by CONFIG REWRITE
sentinel config-epoch userservice 0 //配置纪元
sentinel leader-epoch userservice 0 //领头哨兵的配置纪元
sentinel known-replica userservice 192.168.100.3 6379 //主节点的从节点
sentinel known-replica userservice 192.168.100.4 6379
sentinel current-epoch 0 //当前的配置纪元,在没有发生故障切换时所有哨兵的配置纪元都相同]
哨兵节点正常监控过程中信息查看
- 哨兵信息的发布和订阅
我们之前介绍过,哨兵节点通过向redis的指定频道__sentinel__:hello发布和订阅消息来发现其他哨兵节.
我们现在订阅该频道查看哨兵的发布和订阅信息
> PSUBSCRIBE __sentinel__:hello
Reading messages... (press Ctrl-C to quit)
1) "psubscribe"
2) "__sentinel__:hello"
3) (integer) 1
1) "pmessage"
2) "__sentinel__:hello"
3) "__sentinel__:hello"
4) "192.168.100.4,26379,620c49f4f658095dae056e670822b634a257cc23,0,userservice,192.168.100.2,6379,0"
1) "pmessage"
2) "__sentinel__:hello"
3) "__sentinel__:hello"
4) "192.168.100.2,26379,85519e3d32442ac8c87329f381079007a08f9b9f,0,userservice,192.168.100.2,6379,0"
1) "pmessage"
2) "__sentinel__:hello"
3) "__sentinel__:hello"
4) "192.168.100.3,26379,d7ec4699a9744e8b806efba64a12adf74fdf367f,0,userservice,192.168.100.2,6379,0"
1) "pmessage"
2) "__sentinel__:hello"
3) "__sentinel__:hello"
4) "192.168.100.4,26379,620c49f4f658095dae056e670822b634a257cc23,0,userservice,192.168.100.2,6379,0"
1) "pmessage"
2) "__sentinel__:hello"
3) "__sentinel__:hello"
4) "192.168.100.2,26379,85519e3d32442ac8c87329f381079007a08f9b9f,0,userservice,192.168.100.2,6379,0"
1) "pmessage"
2) "__sentinel__:hello"
3) "__sentinel__:hello"
4) "192.168.100.3,26379,d7ec4699a9744e8b806efba64a12adf74fdf367f,0,userservice,192.168.100.2,6379,0"
可以看到每个哨兵节点的都在向该频道发布订阅消息,其消息的格式为
<源哨兵节点ip>, <源哨兵节点端口号>, <源哨兵节点运行id>, <源哨兵配置版本>, <服务名>, <主节点ip>, <主节点端口号>, <主节点配置版本>
- sentinel常用命令查看
sentinel哨兵进程是一个阉割版的redis进程,因此我们可以使用redis-cli客户端命令进入哨兵的命令行界面并进行日常管理,哨兵的服务端口是26379.
- 查看指定master的状态信息
> sentinel master userservice
1) "name"
2) "userservice"
3) "ip"
4) "192.168.100.2"
5) "port"
6) "6379"
7) "runid"
8) "ce24ee8b27e42b9762ec595deb7542abf9ac4adc"
9) "flags"
10) "master"
11) "link-pending-commands"
12) "0"
13) "link-refcount"
14) "1"
15) "last-ping-sent"
16) "0"
17) "last-ok-ping-reply"
18) "688"
19) "last-ping-reply"
20) "688"
21) "down-after-milliseconds"
22) "5000"
23) "info-refresh"
24) "9582"
25) "role-reported"
26) "master"
27) "role-reported-time"
28) "892700"
29) "config-epoch"
30) "0"
31) "num-slaves"
32) "2"
33) "num-other-sentinels"
34) "2"
35) "quorum"
36) "2"
37) "failover-timeout"
38) "30000"
39) "parallel-syncs"
40) "1"
- 查看主节点信息
> sentinel get-master-addr-by-name userservice
1) "192.168.100.2"
2) "6379"
篇幅有限命令不再一一演示,其他命令可参考如下
故障切换测试
- 故障转移
我们在主节点上将redis服务停止,模拟主节点故障
# systemctl stop redis
分别查看三个节点的log输出
ec2-redis-01哨兵日志
17397:X 04 Jul 2020 21:33:40.687 # +sdown master userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:33:40.740 # +odown master userservice 192.168.100.2 6379 #quorum 3/2
17397:X 04 Jul 2020 21:33:40.740 # +new-epoch 1
17397:X 04 Jul 2020 21:33:40.740 # +try-failover master userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:33:40.742 # +vote-for-leader 85519e3d32442ac8c87329f381079007a08f9b9f 1
17397:X 04 Jul 2020 21:33:40.745 # d7ec4699a9744e8b806efba64a12adf74fdf367f voted for d7ec4699a9744e8b806efba64a12adf74fdf367f 1
17397:X 04 Jul 2020 21:33:40.757 # 620c49f4f658095dae056e670822b634a257cc23 voted for 85519e3d32442ac8c87329f381079007a08f9b9f 1
17397:X 04 Jul 2020 21:33:40.813 # +elected-leader master userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:33:40.813 # +failover-state-select-slave master userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:33:40.866 # +selected-slave slave 192.168.100.3:6379 192.168.100.3 6379 @ userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:33:40.866 * +failover-state-send-slaveof-noone slave 192.168.100.3:6379 192.168.100.3 6379 @ userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:33:40.950 * +failover-state-wait-promotion slave 192.168.100.3:6379 192.168.100.3 6379 @ userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:33:41.467 # +promoted-slave slave 192.168.100.3:6379 192.168.100.3 6379 @ userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:33:41.467 # +failover-state-reconf-slaves master userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:33:41.536 * +slave-reconf-sent slave 192.168.100.4:6379 192.168.100.4 6379 @ userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:33:41.839 # -odown master userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:33:42.478 * +slave-reconf-inprog slave 192.168.100.4:6379 192.168.100.4 6379 @ userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:33:42.478 * +slave-reconf-done slave 192.168.100.4:6379 192.168.100.4 6379 @ userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:33:42.544 # +failover-end master userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:33:42.544 # +switch-master userservice 192.168.100.2 6379 192.168.100.3 6379
17397:X 04 Jul 2020 21:33:42.545 * +slave slave 192.168.100.4:6379 192.168.100.4 6379 @ userservice 192.168.100.3 6379
17397:X 04 Jul 2020 21:33:42.545 * +slave slave 192.168.100.2:6379 192.168.100.2 6379 @ userservice 192.168.100.3 6379
17397:X 04 Jul 2020 21:33:47.548 # +sdown slave 192.168.100.2:6379 192.168.100.2 6379 @ userservice 192.168.100.3 6379
ec2-redis-01 redis日志
1447:M 04 Jul 2020 21:33:35.576 # User requested shutdown...
1447:M 04 Jul 2020 21:33:35.576 * Calling fsync() on the AOF file.
1447:M 04 Jul 2020 21:33:35.576 * Saving the final RDB snapshot before exiting.
1447:M 04 Jul 2020 21:33:35.607 * DB saved on disk
1447:M 04 Jul 2020 21:33:35.607 * Removing the pid file.
1447:M 04 Jul 2020 21:33:35.607 # Redis is now ready to exit, bye bye...
ec2-redis-02哨兵日志
16627:X 04 Jul 2020 21:33:45.384 # +sdown master userservice 192.168.100.2 6379
16627:X 04 Jul 2020 21:33:45.469 # +odown master userservice 192.168.100.2 6379 #quorum 2/2
16627:X 04 Jul 2020 21:33:45.469 # +new-epoch 1
16627:X 04 Jul 2020 21:33:45.469 # +try-failover master userservice 192.168.100.2 6379
16627:X 04 Jul 2020 21:33:45.472 # +vote-for-leader d7ec4699a9744e8b806efba64a12adf74fdf367f 1
16627:X 04 Jul 2020 21:33:45.473 # 85519e3d32442ac8c87329f381079007a08f9b9f voted for 85519e3d32442ac8c87329f381079007a08f9b9f 1
16627:X 04 Jul 2020 21:33:45.485 # 620c49f4f658095dae056e670822b634a257cc23 voted for 85519e3d32442ac8c87329f381079007a08f9b9f 1
16627:X 04 Jul 2020 21:33:46.263 # +config-update-from sentinel 85519e3d32442ac8c87329f381079007a08f9b9f 192.168.100.2 26379 @ userservice 192.168.100.2 6379
16627:X 04 Jul 2020 21:33:46.264 # +switch-master userservice 192.168.100.2 6379 192.168.100.3 6379
16627:X 04 Jul 2020 21:33:46.264 * +slave slave 192.168.100.4:6379 192.168.100.4 6379 @ userservice 192.168.100.3 6379
16627:X 04 Jul 2020 21:33:46.264 * +slave slave 192.168.100.2:6379 192.168.100.2 6379 @ userservice 192.168.100.3 6379
16627:X 04 Jul 2020 21:33:51.269 # +sdown slave 192.168.100.2:6379 192.168.100.2 6379 @ userservice 192.168.100.3 6379
ec2-redis-02 redis日志
1448:S 04 Jul 2020 21:33:40.336 # Connection with master lost.
1448:S 04 Jul 2020 21:33:40.336 * Caching the disconnected master state.
1448:S 04 Jul 2020 21:33:41.004 * Connecting to MASTER 192.168.100.2:6379
1448:S 04 Jul 2020 21:33:41.005 * MASTER <-> REPLICA sync started
1448:S 04 Jul 2020 21:33:41.005 # Error condition on socket for SYNC: Connection refused
1448:S 04 Jul 2020 21:33:42.015 * Connecting to MASTER 192.168.100.2:6379
1448:S 04 Jul 2020 21:33:42.015 * MASTER <-> REPLICA sync started
1448:S 04 Jul 2020 21:33:42.015 # Error condition on socket for SYNC: Connection refused
1448:S 04 Jul 2020 21:33:43.020 * Connecting to MASTER 192.168.100.2:6379
1448:S 04 Jul 2020 21:33:43.020 * MASTER <-> REPLICA sync started
1448:S 04 Jul 2020 21:33:43.021 # Error condition on socket for SYNC: Connection refused
1448:S 04 Jul 2020 21:33:44.030 * Connecting to MASTER 192.168.100.2:6379
1448:S 04 Jul 2020 21:33:44.030 * MASTER <-> REPLICA sync started
1448:S 04 Jul 2020 21:33:44.030 # Error condition on socket for SYNC: Connection refused
1448:S 04 Jul 2020 21:33:45.037 * Connecting to MASTER 192.168.100.2:6379
1448:S 04 Jul 2020 21:33:45.037 * MASTER <-> REPLICA sync started
1448:S 04 Jul 2020 21:33:45.038 # Error condition on socket for SYNC: Connection refused
1448:M 04 Jul 2020 21:33:45.677 # Setting secondary replication ID to 4668490d0bd8e3d2967fe33eb49efde8af6a537b, valid up to offset: 1603594. New replication ID is 727e32660ce348fc07a7805b069
1448:M 04 Jul 2020 21:33:45.677 * Discarding previously cached master state.
1448:M 04 Jul 2020 21:33:45.677 * MASTER MODE enabled (user request from 'id=43 addr=192.168.100.2:27104 fd=10 name=sentinel-85519e3d-cmd age=1869 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qb
1448:M 04 Jul 2020 21:33:45.678 # CONFIG REWRITE executed with success.
1448:M 04 Jul 2020 21:33:46.961 * Replica 192.168.100.4:6379 asks for synchronization
1448:M 04 Jul 2020 21:33:46.961 * Partial resynchronization request from 192.168.100.4:6379 accepted. Sending 743 bytes of backlog starting from offset 1603594.
ec2-redis-03哨兵日志
16556:X 04 Jul 2020 21:33:45.343 # +sdown master userservice 192.168.100.2 6379
16556:X 04 Jul 2020 21:33:45.473 # +new-epoch 1
16556:X 04 Jul 2020 21:33:45.484 # +vote-for-leader 85519e3d32442ac8c87329f381079007a08f9b9f 1
16556:X 04 Jul 2020 21:33:46.263 # +config-update-from sentinel 85519e3d32442ac8c87329f381079007a08f9b9f 192.168.100.2 26379 @ userservice 192.168.100.2 6379
16556:X 04 Jul 2020 21:33:46.263 # +switch-master userservice 192.168.100.2 6379 192.168.100.3 6379
16556:X 04 Jul 2020 21:33:46.264 * +slave slave 192.168.100.4:6379 192.168.100.4 6379 @ userservice 192.168.100.3 6379
16556:X 04 Jul 2020 21:33:46.264 * +slave slave 192.168.100.2:6379 192.168.100.2 6379 @ userservice 192.168.100.3 6379
16556:X 04 Jul 2020 21:33:51.296 # +sdown slave 192.168.100.2:6379 192.168.100.2 6379 @ userservice 192.168.100.3 6379
ec2-redis-03 redis日志
1444:S 04 Jul 2020 21:33:40.336 # Connection with master lost.
1444:S 04 Jul 2020 21:33:40.336 * Caching the disconnected master state.
1444:S 04 Jul 2020 21:33:40.900 * Connecting to MASTER 192.168.100.2:6379
1444:S 04 Jul 2020 21:33:40.900 * MASTER <-> REPLICA sync started
1444:S 04 Jul 2020 21:33:40.905 # Error condition on socket for SYNC: Connection refused
1444:S 04 Jul 2020 21:33:41.910 * Connecting to MASTER 192.168.100.2:6379
1444:S 04 Jul 2020 21:33:41.911 * MASTER <-> REPLICA sync started
1444:S 04 Jul 2020 21:33:41.911 # Error condition on socket for SYNC: Connection refused
1444:S 04 Jul 2020 21:33:42.919 * Connecting to MASTER 192.168.100.2:6379
1444:S 04 Jul 2020 21:33:42.919 * MASTER <-> REPLICA sync started
1444:S 04 Jul 2020 21:33:42.920 # Error condition on socket for SYNC: Connection refused
1444:S 04 Jul 2020 21:33:43.929 * Connecting to MASTER 192.168.100.2:6379
1444:S 04 Jul 2020 21:33:43.929 * MASTER <-> REPLICA sync started
1444:S 04 Jul 2020 21:33:43.932 # Error condition on socket for SYNC: Connection refused
1444:S 04 Jul 2020 21:33:44.937 * Connecting to MASTER 192.168.100.2:6379
1444:S 04 Jul 2020 21:33:44.938 * MASTER <-> REPLICA sync started
1444:S 04 Jul 2020 21:33:44.938 # Error condition on socket for SYNC: Connection refused
1444:S 04 Jul 2020 21:33:45.947 * Connecting to MASTER 192.168.100.2:6379
1444:S 04 Jul 2020 21:33:45.947 * MASTER <-> REPLICA sync started
1444:S 04 Jul 2020 21:33:45.948 # Error condition on socket for SYNC: Connection refused
1444:S 04 Jul 2020 21:33:46.263 * REPLICAOF 192.168.100.3:6379 enabled (user request from 'id=39 addr=192.168.100.2:41530 fd=10 name=sentinel-85519e3d-cmd age=1870 idle=0 flags=x db=0 sub=0 nts=r cmd=exec')
1444:S 04 Jul 2020 21:33:46.264 # CONFIG REWRITE executed with success.
1444:S 04 Jul 2020 21:33:46.957 * Connecting to MASTER 192.168.100.3:6379
1444:S 04 Jul 2020 21:33:46.957 * MASTER <-> REPLICA sync started
1444:S 04 Jul 2020 21:33:46.957 * Non blocking connect for SYNC fired the event.
1444:S 04 Jul 2020 21:33:46.958 * Master replied to PING, replication can continue...
1444:S 04 Jul 2020 21:33:46.960 * Trying a partial resynchronization (request 4668490d0bd8e3d2967fe33eb49efde8af6a537b:1603594).
1444:S 04 Jul 2020 21:33:46.961 * Successful partial resynchronization with master.
1444:S 04 Jul 2020 21:33:46.961 # Master replication ID changed to 727e32660ce348fc07a7805b06954ef9752821b7
1444:S 04 Jul 2020 21:33:46.961 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
首先,在哨兵节点监测到主节点宕机后,认为该主节点进入主观下线状态(sdown),通过三个哨兵节点的的输出日志可以看出,如下
17397:X 04 Jul 2020 21:33:40.687 # +sdown master userservice 192.168.100.2 6379
当主节点进入主观下线后,最先发现该主节点宕机的哨兵向其他哨兵节点发送命令SENTINEL is-master-down-by-addr询问是否认为主节点宕机,当其中有两个哨兵节点均认为主节点宕机时,那么主节点便会进入客观下线阶段(odown),哨兵配置文件指定的quorum为2,说明只要有两个节点认为下线即可达到客观下线的标准,具体的log可参考如下
哨兵ec2-redis-01
17397:X 04 Jul 2020 21:33:40.740 # +odown master userservice 192.168.100.2 6379 #quorum 3/2
哨兵ec2-redis-02
16627:X 04 Jul 2020 21:33:45.469 # +odown master userservice 192.168.100.2 6379 #quorum 2/2
当主节点进入到客观下线状态后,哨兵开始执行执行故障转移.从哨兵的以下日志可以看出
哨兵ec2-redis-01
16627:X 04 Jul 2020 21:33:45.469 # +try-failover master userservice 192.168.100.2 6379
哨兵ec2-redis-02
16627:X 04 Jul 2020 21:33:45.469 # +try-failover master userservice 192.168.100.2 6379
执行故障转移前,哨兵节点需要选举一个领头的哨兵
哨兵ec2-redis-01投票
17397:X 04 Jul 2020 21:33:40.742 # +vote-for-leader 85519e3d32442ac8c87329f381079007a08f9b9f 1
哨兵ec2-redis-02投票
16627:X 04 Jul 2020 21:33:45.472 # +vote-for-leader d7ec4699a9744e8b806efba64a12adf74fdf367f 1
哨兵ec2-redis-03投票
16556:X 04 Jul 2020 21:33:45.484 # +vote-for-leader 85519e3d32442ac8c87329f381079007a08f9b9f 1
以上可以看到,运行id为85519e3d32442ac8c87329f381079007a08f9b9f的哨兵节点共获得2票,成为领头哨兵,而且哨兵ec2-redis-01的日志里也显示自己获得了多数票数成为领头哨兵
17397:X 04 Jul 2020 21:33:40.813 # +elected-leader master userservice 192.168.100.2 6379
领头哨兵选举出后,开始选择优先级最高的从节点,以下说明领头哨兵选择ec2-redis-02节点作为优先级最高的从节点.
17397:X 04 Jul 2020 21:33:40.866 # +selected-slave slave 192.168.100.3:6379 192.168.100.3 6379 @ userservice 192.168.100.2 6379
然后领头哨兵开始执行故障转移
17397:X 04 Jul 2020 21:33:40.866 * +failover-state-send-slaveof-noone slave 192.168.100.3:6379 192.168.100.3 6379 @ userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:33:40.950 * +failover-state-wait-promotion slave 192.168.100.3:6379 192.168.100.3 6379 @ userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:33:41.467 # +promoted-slave slave 192.168.100.3:6379 192.168.100.3 6379 @ userservice 192.168.100.2 6379
执行故障转移成功(即在要提升的从节点成功执行REPLICAOF NO ONE)后,领头哨兵开始重新配置原来的主节点和连接原来主节点的从节点
- 尝试将原来的主节点降级为新的主节点的从节点
- 将连接原来主节点的从节点重新配置为连接到新的主节点
- 将最新的配置信息通过发布订阅的方式同步到其他的哨兵节点
17397:X 04 Jul 2020 21:33:41.467 # +failover-state-reconf-slaves master userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:33:41.536 * +slave-reconf-sent slave 192.168.100.4:6379 192.168.100.4 6379 @ userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:33:41.839 # -odown master userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:33:42.478 * +slave-reconf-inprog slave 192.168.100.4:6379 192.168.100.4 6379 @ userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:33:42.478 * +slave-reconf-done slave 192.168.100.4:6379 192.168.100.4 6379 @ userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:33:42.544 # +failover-end master userservice 192.168.100.2 6379
17397:X 04 Jul 2020 21:33:42.544 # +switch-master userservice 192.168.100.2 6379 192.168.100.3 6379
17397:X 04 Jul 2020 21:33:42.545 * +slave slave 192.168.100.4:6379 192.168.100.4 6379 @ userservice 192.168.100.3 6379
17397:X 04 Jul 2020 21:33:42.545 * +slave slave 192.168.100.2:6379 192.168.100.2 6379 @ userservice 192.168.100.3 6379
17397:X 04 Jul 2020 21:33:47.548 # +sdown slave 192.168.100.2:6379 192.168.100.2 6379 @ userservice 192.168.100.3 6379
哨兵ec2-redis-02的日志和哨兵ec2-redis-02的日志说明领头哨兵让其他哨兵节点以自己的信息为准强制其他哨兵节点更新配置信息,如下所示.
16627:X 04 Jul 2020 21:33:46.263 # +config-update-from sentinel 85519e3d32442ac8c87329f381079007a08f9b9f 192.168.100.2 26379 @ userservice 192.168.100.2 6379
最后,哨兵也会将新的主节点的信息同步到各个从节点的主配置文件中,通过redis的log文件可以看出
1444:S 04 Jul 2020 21:33:46.264 # CONFIG REWRITE executed with success.
这里需要说明的是,连接原来主节点的从节点重新连接新的主节点后,并不需要从新的主节点同步一份完整的数据,仅仅需要同步缺失的数据即可.
1444:S 04 Jul 2020 21:33:46.957 * Connecting to MASTER 192.168.100.3:6379
1444:S 04 Jul 2020 21:33:46.957 * MASTER <-> REPLICA sync started
1444:S 04 Jul 2020 21:33:46.957 * Non blocking connect for SYNC fired the event.
1444:S 04 Jul 2020 21:33:46.958 * Master replied to PING, replication can continue...
1444:S 04 Jul 2020 21:33:46.960 * Trying a partial resynchronization (request 4668490d0bd8e3d2967fe33eb49efde8af6a537b:1603594).
1444:S 04 Jul 2020 21:33:46.961 * Successful partial resynchronization with master.
1444:S 04 Jul 2020 21:33:46.961 # Master replication ID changed to 727e32660ce348fc07a7805b06954ef9752821b7
1444:S 04 Jul 2020 21:33:46.961 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
- 原来的主节点重新上线
当原来的主节点重新上线后,会发生什么?
我们将原来的主节点启动
# systemctl start redis
首先哨兵节点会将该节点降级为新的主节点的从节点
17397:X 04 Jul 2020 22:53:18.228 # -sdown slave 192.168.100.2:6379 192.168.100.2 6379 @ userservice 192.168.100.3 6379
17397:X 04 Jul 2020 22:53:28.179 * +convert-to-slave slave 192.168.100.2:6379 192.168.100.2 6379 @ userservice 192.168.100.3 6379
被降级的主节点成为从节点后,尝试从新的主节点执行部分数据同步
18728:S 04 Jul 2020 22:53:29.176 * Trying a partial resynchronization (request 839c9173e6e76ea20ebf46f36c963bc9f81ac84c:1).
但是被新的主节点拒绝.
1448:M 04 Jul 2020 22:53:33.904 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '839c9173e6e76ea20ebf46f36c963bc9f81ac84c', my replication IDs are '727e32660ce348fc07a7805b06954ef9752821b7' and '4668490d0bd8e3d2967fe33eb49efde8af6a537b')
这时被降级的主节点会重新进行一次数据初始化并重新从新的主节点执行一次完整的数据同步.
18728:S 04 Jul 2020 22:53:29.180 * Full resync from master: 727e32660ce348fc07a7805b06954ef9752821b7:2641396
18728:S 04 Jul 2020 22:53:29.180 * Discarding previously cached master state.
18728:S 04 Jul 2020 22:53:29.303 * MASTER <-> REPLICA sync: receiving 923044 bytes from master
18728:S 04 Jul 2020 22:53:29.361 * MASTER <-> REPLICA sync: Flushing old data
18728:S 04 Jul 2020 22:53:29.374 * MASTER <-> REPLICA sync: Loading DB in memory
18728:S 04 Jul 2020 22:53:29.414 * MASTER <-> REPLICA sync: Finished with success
这里再补充一点,如果哨兵执行故障转移时由于其他原因,例如没有足够数量的quorum支持,会在下一个周期重新执行故障转移,此时哨兵的log会显示如下.
1451:X 05 Jul 2020 11:34:14.518 # Next failover delay: I will not start a failover before Sun Jul 5 11:35:13 2020
以上简单对哨兵的实际部署和故障转移过程做一个简单的介绍.
相关推荐
- 如何检查 Linux 服务器是物理服务器还是虚拟服务器?
-
在企业级运维、故障排查和性能调优过程中,准确了解服务器的运行环境至关重要。无论是物理机还是虚拟机,都存在各自的优势与限制。在很多场景下,尤其是当你继承一台服务器而不清楚底层硬件细节时,如何快速辨识它是...
- 第四节 Windows 系统 Docker 安装全指南
-
一、Docker在Windows上的运行原理(一)架构限制说明Docker本质上依赖Linux内核特性(如Namespaces、Cgroups等),因此在Windows系统上无法直...
- C++ std:shared_ptr自定义allocator引入内存池
-
当C++项目里做了大量的动态内存分配与释放,可能会导致内存碎片,使系统性能降低。当动态内存分配的开销变得不容忽视时,一种解决办法是一次从操作系统分配一块大的静态内存作为内存池进行手动管理,堆对象内存分...
- Activiti 8.0.0 发布,业务流程管理与工作流系统
-
Activiti8.0.0现已发布。Activiti是一个业务流程管理(BPM)和工作流系统,适用于开发人员和系统管理员。其核心是超快速、稳定的BPMN2流程引擎。Activiti可以...
- MyBatis动态SQL的5种高级玩法,90%的人只用过3种
-
MyBatis动态SQL在日常开发中频繁使用,但大多数开发者仅掌握基础标签。本文将介绍五种高阶技巧,助你解锁更灵活的SQL控制能力。一、智能修剪(Trim标签)应用场景:动态处理字段更新,替代<...
- Springboot数据访问(整合Mybatis Plus)
-
Springboot整合MybatisPlus1、创建数据表2、引入maven依赖mybatis-plus-boot-starter主要引入这个依赖,其他相关的依赖在这里就不写了。3、项目结构目录h...
- 盘点金州勇士在奥克兰13年的13大球星 满满的全是...
-
见证了两个月前勇士与猛龙那个史诗般的系列赛后,甲骨文球馆正式成为了历史。那个大大的红色标志被一个字母一个字母地移除,在周四,一切都成为了过去式。然而这座,别名为“Roaracle”(译注:Roar怒吼...
- Mybatis入门看这一篇就够了(mybatis快速入门)
-
什么是MyBatisMyBatis本是apache的一个开源项目iBatis,2010年这个项目由apachesoftwarefoundation迁移到了googlecode,并且改名为M...
- Springboot数据访问(整合druid数据源)
-
Springboot整合druid数据源基本概念SpringBoot默认的数据源是:2.0之前:org.apache.tomcat.jdbc.pool.DataSource2.0及之后:com.z...
- Linux 中的 "/etc/profile.d" 目录有什么作用 ?
-
什么是/etc/profile.d/目录?/etc/profile.d/目录是Linux系统不可或缺的一部分保留配置脚本。它与/etc/profile文件相关联,这是一个启动脚本,该脚...
- 企业数据库安全管理规范(企业数据库安全管理规范最新版)
-
1.目的为规范数据库系统安全使用活动,降低因使用不当而带来的安全风险,保障数据库系统及相关应用系统的安全,特制定本数据库安全管理规范。2.适用范围本规范中所定义的数据管理内容,特指存放在信息系统数据库...
- Oracle 伪列!这些隐藏用法你都知道吗?
-
在Oracle数据库中,有几位特殊的“成员”——伪列,它们虽然不是表中真实存在的物理列,但却能在数据查询、处理过程中发挥出意想不到的强大作用。今天给大家分享Oracle伪列的使用技巧,无论...
- Oracle 高效处理数据的隐藏神器:临时表妙用
-
各位数据库搬砖人,在Oracle的代码世界里闯荡,处理复杂业务时,是不是总被数据“搅得头大”?今天给大家安利一个超实用的隐藏神器——临时表!当你需要临时存储中间计算结果,又不想污染正式数据表...
- Oracle 数据库查询:多表查询(oracle多表关联查询)
-
一、多表查询基础1.JOIN操作-INNERJOIN:返回两个表中满足连接条件的匹配行,不保留未匹配数据。SELECTa.emp_id,b.dept_nameFROMempl...
- 一文掌握怎么利用Shell+Python实现多数据源的异地备份程序
-
简介:在信息化时代,数据安全和业务连续性已成为企业和个人用户关注的焦点。无论是网站数据、数据库、日志文件,还是用户上传的文档、图片等,数据一旦丢失,损失难以估量。尤其是当数据分布在多个不同的目录、服务...
你 发表评论:
欢迎- 一周热门
- 最近发表
-
- 如何检查 Linux 服务器是物理服务器还是虚拟服务器?
- 第四节 Windows 系统 Docker 安装全指南
- C++ std:shared_ptr自定义allocator引入内存池
- Activiti 8.0.0 发布,业务流程管理与工作流系统
- MyBatis动态SQL的5种高级玩法,90%的人只用过3种
- Springboot数据访问(整合Mybatis Plus)
- 盘点金州勇士在奥克兰13年的13大球星 满满的全是...
- Mybatis入门看这一篇就够了(mybatis快速入门)
- Springboot数据访问(整合druid数据源)
- Linux 中的 "/etc/profile.d" 目录有什么作用 ?
- 标签列表
-
- oracle位图索引 (74)
- oracle基目录 (50)
- oracle批量插入数据 (65)
- oracle事务隔离级别 (53)
- oracle主从同步 (55)
- oracle 乐观锁 (51)
- redis 命令 (78)
- php redis (88)
- redis 存储 (66)
- redis 锁 (69)
- 启动 redis (66)
- redis 时间 (56)
- redis 删除 (67)
- redis内存 (57)
- redis并发 (52)
- redis 主从 (69)
- redis 订阅 (51)
- redis 登录 (54)
- redis 面试 (58)
- 阿里 redis (59)
- redis 搭建 (53)
- redis的缓存 (55)
- lua redis (58)
- redis 连接池 (61)
- redis 限流 (51)