keepalived实现双机热备
本文主要讲述一下keepalived的工作原理,及如何实现双机热备
1. keepalived简介
keepalved软件完全是由C语言编写的。该项目的主要目标为Linux操作系统是提供一个简便、鲁棒的方式实现负载均衡和高可用性。负载均衡架构依赖于著名并被广泛使用的LVS
四层负载均衡架构。keepalived实现了一系列的checkers
,可以通过检查各负载均衡器的健康状况来动态的监测和管理各负载均衡服务。另一方面,高可用性是通过VRRP
(Virtual Router Redundancy Protocol,虚拟路由冗余协议协议)来实现的。VRRP是一种应对路由失败的基础设施。
此外,keepalived为VRRP有限状态机实现了一系列的钩子(hooks)以提供底层
(low-level)和高速
的协议交互。为了提供最快速的网络失败检测,keepalived实现了BFD协议。VRRP状态装换可以把BFD命中考虑在内以更快速的驱动状态机转换。keepalived框架可以被单独的使用,也可以配合LVS等一起使用以提供更富弹性的基础设施。
简而言之,keepalived提供了两个主要功能:
-
对LVS系统进行健康检查
-
实施VRRPv2堆栈以处理负载均衡器故障转移
健康检查和失败切换是keepalived
的两大核心功能。所谓的健康检查,就是采用tcp三次握手、http请求、udp echo请求等方式对负载均衡器后面的实际服务器进行保活检测;而失败切换主要是应用于配置了主备模式的负载均衡器,利用VRRP(虚拟路由冗余协议,可参考RFC文档 )维持主备负载均衡器的心跳,当主负载均衡器出现问题时,由备负载均衡器承载对应的业务,从而在最大限度上减少流量损失,并保证服务的稳定性。
1.1 keepalived使用的内核组件
keepalived使用四个Linux内核组件:
-
LVS框架: 用于实现数据流量的负载均衡
-
Netfilter框架: 支持NAT和IP伪装
-
Netlink接口: 设置和删除网络接口上的VRRP虚拟IP
-
组播: 将VRRP通告发送到保留的VRRP MULTICAST组(224.0.0.18)
上图是keepalived
的功能体系结构,大致分为两层: 用户空间(user space)和内核空间(kernel space)。
1) 内核空间
主要包括IPVS
(IP虚拟服务器,用于实现网络服务的负载均衡)和NETLINK
(提供高级路由及其他相关的网络功能)两个部分。
2) 用户空间
-
WatchDog: 负责监控checkers以及VRRP进程的状况
-
VRRP Stack: 负责负载均衡器之间的失败切换Failover,如果只用一个负载均衡器,则VRRP不是必须的。
-
Checkers: 负责真实服务器的健康检查health checking,是keepalived最主要的功能。换言之,可以没有VRRP Stack,但健康检查health checking是一定要有的。
-
IPVS wrapper: 发送用户设置的规则到内核ipvs代码
-
Netlink Reflector: 用来设定vrrp的vip地址等
2. VRRP协议
VRRP全称Virtual Router Redundancy Protocol,即虚拟路由冗余协议。可以认为它是实现路由器高可用的容错协议,即将N台提供相同功能的路由器组成一个路由器组
(Router Group),这个组里有一个master和多个backup,但在外界看来就像一台一样,构成虚拟路由器,拥有一个虚拟IP(vip,也就是路由器所在局域网内其他机器的默认路由),占有这个IP的master
实际负责ARP
响应和转发IP数据包,组中的其他路由器作为备份的角色处于待命状态。master会发送组播消息,当backup在超时时间内收不到vrrp包时就认为master宕掉了,这时就需要根据VRRP的优先级来选举一个backup充当新的master,保证路由器的高可用。
在VRRP协议实现里,虚拟路由器使用00-00-5E-00-01-XX
作为虚拟MAC地址,XX
就是唯一的VRID(Virtual Router IDentifier),这个地址同一时间只有一个物理路由器占用。在虚拟路由器里面的物理路由器组通过多播IP地址224.0.0.18
来定时发送通告消息。每个Router都有一个1-255之间的优先级别,级别最高的(highest priority)将成为主控(master)路由器。通过降低master的优先权可以让处于backup状态的路由器抢占主路由器的状态,两个backup优先级相同时IP地址较大者为master,接管虚拟IP。
3. 与heartbeat/corosync等比较
Heartbeat、Corosync、Keepalived这三个集群组件到底选哪个好? 首先我想说明的是,Keepalived与Heartbeat、Corosync根本不是同一类型的(Heartbeat、Corosync是属于同一类型)。Keepalived使用的是vrrp协议方式,即虚拟路由冗余协议(Virtual Router Redundancy Protocol,简写为VRRP); Heartbeat或CoroSync是基于主机或网络服务的高可用方式。简单的说就是,Keepalived的目的是模拟路由器的高可用,Heartbeat或Corosync的目的是实现Service的高可用。
所以,一般keepalived是实现前端高可用,常用的前端高可用组合有:LVS+Keeplived、Nginx+Keepalived、HAproxy+keepalived。而Heartbeat或Corosync是实现服务的高可用,常见的组合有Heartbeat v3(Corosync) + Pacemaker + NFS + Httpd实现Web服务的高可用、Heartbeat v3(Corosync) + Pacemaker + NFS + MySQL实现MySQL服务的高可用。总结一下,Keepalived实现轻量级的高可用,一般用于前端高可用,且不需要共享存储,一般常用于两个节点之间的高可用; 而Heartbeat或Corosync一般用于服务的高可用,且需要共享存储,一般用于多个节点的高可用。
补充: 有博友可能会问,那heartbeat与corosync我们又应该选哪个好啊? 我想说我们一般选用corosync,因为corosync的运行机制更优于heartbeat, 就连从heartbeat分离出来的pacemaker都说在以后的开发当中更倾向于corosync,所以现在corosync+pacemaker是最佳组合。
4. keepalived的安装
当前我们的操作系统环境为:
# uname -a Linux localhost.localdomain 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux # cat /etc/redhat-release CentOS Linux release 7.3.1611 (Core)
如下介绍具体的安装步骤:
1) 下载软件安装包并解压
# mkdir keepalived_setup # cd keepalived_setup # wget https://www.keepalived.org/software/keepalived-2.0.17.tar.gz # tar -zxvf keepalived-2.0.17.tar.gz # cd keepalived-2.0.17
2) 安装依赖文件
查看keepalived
的安装说明:
这里我们根据需要安装对应的依赖包(如果某些安装包直接采用yum install
找不到的话,也可以到http://www.rpmfind.net/linux/rpm2html/search.php进行查找)。如下我们执行安装:
# yum list installed | grep openssl Repodata is over 2 weeks old. Install yum-cron? Or run: yum makecache fast openssl.x86_64 1:1.0.2k-16.el7_6.1 @updates openssl-devel.x86_64 1:1.0.2k-16.el7_6.1 @updates openssl-libs.x86_64 1:1.0.2k-16.el7_6.1 @updates # yum list installed | grep pcre Repodata is over 2 weeks old. Install yum-cron? Or run: yum makecache fast pcre.x86_64 8.32-17.el7 @base pcre-devel.x86_64 8.32-17.el7 @base # yum list installed | grep iptables Repodata is over 2 weeks old. Install yum-cron? Or run: yum makecache fast iptables.x86_64 1.4.21-17.el7 @anaconda # yum list installed | grep "snmp" Repodata is over 2 weeks old. Install yum-cron? Or run: yum makecache fast net-snmp-libs.x86_64 1:5.7.2-24.el7_2.1 @anaconda
3) 安装keepalived
安装完成之后,可以看到如下:
# tree /usr/local/sbin/ /usr/local/sbin/ └── keepalived 0 directories, 1 file # tree /usr/local/keepalived/ /usr/local/keepalived/ ├── bin │ └── genhash └── share ├── doc │ └── keepalived │ └── README ├── man │ ├── man1 │ │ └── genhash.1 │ ├── man5 │ │ └── keepalived.conf.5 │ └── man8 │ └── keepalived.8 └── snmp └── mibs 10 directories, 5 files # tree /etc/keepalived/ /etc/keepalived/ ├── keepalived │ ├── keepalived.conf │ └── samples │ ├── client.pem │ ├── dh1024.pem │ ├── keepalived.conf.conditional_conf │ ├── keepalived.conf.fwmark │ ├── keepalived.conf.HTTP_GET.port │ ├── keepalived.conf.inhibit │ ├── keepalived.conf.IPv6 │ ├── keepalived.conf.misc_check │ ├── keepalived.conf.misc_check_arg │ ├── keepalived.conf.quorum │ ├── keepalived.conf.sample │ ├── keepalived.conf.SMTP_CHECK │ ├── keepalived.conf.SSL_GET │ ├── keepalived.conf.status_code │ ├── keepalived.conf.track_interface │ ├── keepalived.conf.virtualhost │ ├── keepalived.conf.virtual_server_group │ ├── keepalived.conf.vrrp │ ├── keepalived.conf.vrrp.localcheck │ ├── keepalived.conf.vrrp.lvs_syncd │ ├── keepalived.conf.vrrp.routes │ ├── keepalived.conf.vrrp.rules │ ├── keepalived.conf.vrrp.scripts │ ├── keepalived.conf.vrrp.static_ipaddress │ ├── keepalived.conf.vrrp.sync │ ├── root.pem │ ├── sample.misccheck.smbcheck.sh │ └── sample_notify_fifo.sh └── sysconfig └── keepalived 3 directories, 30 files
这里将/etc/keepalived/keepalived/目录下的文件移动到/etc/keepalived目录,以使后面通过systemd
能够找到:
# mv /etc/keepalived/keepalived/* /etc/keepalived/ # rm -rf /etc/keepalived/keepalived/
4) 设置keepalived开机启动
默认情况下,安装完成之后,已经在/usr/lib/systemd/system目录下为我们生成了一个keeepalived.service
,如果没有可以在keepalived的源代码目录下将keepalived.server
文件拷贝到该目录:
# ls -al /usr/lib/systemd/system/keepalived.service -rw-r--r-- 1 root root 383 Jul 7 20:05 /usr/lib/systemd/system/keepalived.service
执行如下命令设置为开机启动:
# systemctl enable keepalived Created symlink from /etc/systemd/system/multi-user.target.wants/keepalived.service to /usr/lib/systemd/system/keepalived.service. [root@localhost keepalived-2.0.17]# systemctl daemon-reload
5) keepalived的配置
这里我们暂时不讲解具体的keepalived各项配置,仅给出如下我们安装好后的默认配置,以作参考:
这里我们简单说一下keepalived的配置文件:
-
注释以
#
或者!
开头,直到该行的结尾 -
通常由global_defs、vrrp_instance、virtual_server这3大模块组成
上面的示例中,在real_server里面加上了针对HTTP的健康检查,另外其实我们可以加上针对TCP的监看检查,例如:
5. Keepalived实现双机热备
keepalived的作用是检测后端TCP服务的状态。如果有一台提供TCP服务的后端节点死机,或者出现工作故障,keepalived会及时检测到,并将有故障的节点从系统中剔除;当提供TCP服务的节点恢复并且正常提供服务后keepalived会自动将TCP服务的节点加入到集群中。这些工作都是keepalived自动完成,不需要人工干涉,需要人工做的只是修复发生故障的服务器,以下通过示例来演示。测试环境如下:
keepalived主机: 192.168.79.128 keepalived备机: 192.168.79.129 http服务器1: 192.168.79.128 http服务器2: 192.168.79.129 http服务器3: 192.168.79.131 vip: 192.168.79.180
在进行具体工作之前,我们最好先关闭SELinux
。执行如下命令查看当前SELinux状态:
# getenforce Enabled
有两种方式来执行关闭: 临时关闭与永久关闭
- 临时关闭SELinux
# setenforce 0 setenforce: SELinux is disabled
- 永久关闭
修改/etc/selinux/config文件, 将SELINUX=enforcing改为SELINUX=disabled,然后重启操作系统即可。
5.1 安装keepalived及nginx服务器
1) 安装keepalived
在192.168.79.128以及192.168.79.129这两台主机上安装keepalived,具体安装方法参看本文前面章节。
2) 安装nginx
在192.168.79.128、192.168.79.129、192.168.79.131这三台主机上安装nginx,具体安装方法这里不做介绍。安装完成之后启动nginx服务。
5.2 keepalived配置
- keepalived master配置
在192.168.79.128
主机上备份原来的keepalived.conf
文件,然后将配置修改为如下:
- keepalived backup配置
在192.168.79.129
主机上备份原来的keepalived.conf
文件,然后将配置修改为如下(主要修改了state
以及priority
两个字段):
5.3 启动keepalived服务
1) 启动keepalived
执行如下命令启动主备keepalived服务,并查看启动状态:
# systemctl start keepalived //启动主 # systemctl status keepalived ● keepalived.service - LVS and VRRP High Availability Monitor Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2019-07-08 03:10:48 PDT; 6s ago Process: 103385 ExecStart=/usr/local/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 103388 (keepalived) CGroup: /system.slice/keepalived.service ├─103388 /usr/local/sbin/keepalived -D ├─103389 /usr/local/sbin/keepalived -D └─103390 /usr/local/sbin/keepalived -D Jul 08 03:10:48 localhost.localdomain Keepalived_vrrp[103390]: VRRP sockpool: [ifindex(2), family(IPv4), proto(112), unicast(0), fd(11,12)] Jul 08 03:10:52 localhost.localdomain Keepalived_vrrp[103390]: (VI_180) Receive advertisement timeout Jul 08 03:10:52 localhost.localdomain Keepalived_vrrp[103390]: (VI_180) Entering MASTER STATE Jul 08 03:10:52 localhost.localdomain Keepalived_vrrp[103390]: (VI_180) setting VIPs. Jul 08 03:10:52 localhost.localdomain Keepalived_vrrp[103390]: Sending gratuitous ARP on ens33 for 192.168.79.180 Jul 08 03:10:52 localhost.localdomain Keepalived_vrrp[103390]: (VI_180) Sending/queueing gratuitous ARPs on ens33 for 192.168.79.180 Jul 08 03:10:52 localhost.localdomain Keepalived_vrrp[103390]: Sending gratuitous ARP on ens33 for 192.168.79.180 Jul 08 03:10:52 localhost.localdomain Keepalived_vrrp[103390]: Sending gratuitous ARP on ens33 for 192.168.79.180 Jul 08 03:10:52 localhost.localdomain Keepalived_vrrp[103390]: Sending gratuitous ARP on ens33 for 192.168.79.180 Jul 08 03:10:52 localhost.localdomain Keepalived_vrrp[103390]: Sending gratuitous ARP on ens33 for 192.168.79.180 # systemctl start keepalived //启动备 # systemctl status keepalived ● keepalived.service - LVS and VRRP High Availability Monitor Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2019-07-08 18:17:45 CST; 2min 21s ago Process: 48763 ExecStart=/usr/local/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 48767 (keepalived) Memory: 932.0K CGroup: /system.slice/keepalived.service ├─48767 /usr/local/sbin/keepalived -D ├─48768 /usr/local/sbin/keepalived -D └─48769 /usr/local/sbin/keepalived -D Jul 08 18:17:43 localhost.localdomain Keepalived_vrrp[48769]: Registering Kernel netlink command channel Jul 08 18:17:43 localhost.localdomain Keepalived_vrrp[48769]: Opening file '/etc/keepalived/keepalived.conf'. Jul 08 18:17:43 localhost.localdomain Keepalived_vrrp[48769]: Assigned address 192.168.79.129 for interface ens33 Jul 08 18:17:43 localhost.localdomain Keepalived_vrrp[48769]: Assigned address fe80::7e75:c1ed:6f41:49d4 for interface ens33 Jul 08 18:17:43 localhost.localdomain Keepalived_vrrp[48769]: Registering gratuitous ARP shared channel Jul 08 18:17:43 localhost.localdomain Keepalived_vrrp[48769]: (VI_180) removing VIPs. Jul 08 18:17:43 localhost.localdomain Keepalived_vrrp[48769]: (VI_180) Entering BACKUP STATE (init) Jul 08 18:17:43 localhost.localdomain Keepalived_vrrp[48769]: VRRP sockpool: [ifindex(2), family(IPv4), proto(112), unicast(0), fd(11,12)] Jul 08 18:17:44 localhost.localdomain Keepalived_healthcheckers[48768]: Gained quorum 1+0=1 <= 3 for VS [192.168.79.180]:tcp:80 Jul 08 18:17:45 localhost.localdomain systemd[1]: Started LVS and VRRP High Availability Monitor.
2) 查看keepalived主机IP
在192.168.79.128
上查看主机IP:
可以看到在master keepalived
主机上绑定了vip,同样我们可以查看backup keepalived
主机,我们看到此时并没有绑定vip。
5.4 测试keepalived
我们通过浏览器请求VIP上面的http服务:
# curl -X GET http://192.168.79.180/
可以看到服务正常返回。
接着我们关掉master keepalived
,即关掉192.168.79.128上的keepalived服务,执行如下命令:
# systemctl stop keepalived # systemctl status keepalived ● keepalived.service - LVS and VRRP High Availability Monitor Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled) Active: inactive (dead) since Mon 2019-07-08 03:34:39 PDT; 7s ago Process: 103385 ExecStart=/usr/local/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 103388 (code=exited, status=0/SUCCESS) Jul 08 03:34:38 localhost.localdomain systemd[1]: Stopping LVS and VRRP High Availability Monitor... Jul 08 03:34:38 localhost.localdomain Keepalived[103388]: Stopping Jul 08 03:34:38 localhost.localdomain Keepalived_vrrp[103390]: (VI_180) sent 0 priority Jul 08 03:34:38 localhost.localdomain Keepalived_vrrp[103390]: (VI_180) removing VIPs. Jul 08 03:34:38 localhost.localdomain Keepalived_healthcheckers[103389]: Shutting down service [192.168.79.128]:tcp:80 from VS [192.168.79.180]:tcp:80 Jul 08 03:34:38 localhost.localdomain Keepalived_healthcheckers[103389]: Shutting down service [192.168.79.129]:tcp:80 from VS [192.168.79.180]:tcp:80 Jul 08 03:34:38 localhost.localdomain Keepalived_healthcheckers[103389]: Shutting down service [192.168.79.131]:tcp:80 from VS [192.168.79.180]:tcp:80 Jul 08 03:34:39 localhost.localdomain Keepalived_vrrp[103390]: Stopped - used 0.019591 user time, 0.489784 system time Jul 08 03:34:39 localhost.localdomain Keepalived[103388]: Stopped Keepalived v2.0.17 (06/25,2019) Jul 08 03:34:39 localhost.localdomain systemd[1]: Stopped LVS and VRRP High Availability Monitor.
之后我们查看备机192.168.79.129主机上的keepalived服务:
可以看到vip绑定到了192.168.79.129
主机上。然后我们再请求nginx服务,发现仍可以正常工作。
之后我们重启192.168.79.128
主机上的keepalived服务,可以发现vip又回到了192.168.79.128这台master keepalived主机上,而在192.168.79.129这台backup keepalived主机上的vip解绑了。
5.5 查看vrrp数据包
我们在局域网内的三台主机上抓包:
-
keepalived主机: 192.168.79.128
-
keepalived备机: 192.168.79.129
-
http服务器1: 192.168.79.128
执行以下命令抓包(keepalived默认多播地址是224.0.0.18
,可以通过vrrp_mcast_group4选项进行修改):
可以看到每秒中产生一个多播数据包。
6. 补充
很多时候keepalived/fwmark还会搭配iptables来使用。一般我们可以先在iptables的mangle表对数据流量进行标记,然后在keepalived中根据相应的标记进行流量转发。例如:
上面第一条表示: 对于发送到目标端口为17443的数据流量(源网卡地址不为F8:98:EF:7E:9E:8B,通常为另一个副本lvs的网卡地址),将其标志为617443(0x96be3)
注:上面10.4.18.69应该是需要设置的vip地址
那么此时,我们可以在keepalived做如下配置以处理带有该标志的流量:
[参看]