今天安装keepalived-1.3.5+nginx做高可用的时候发现keepalived死活启动不了。
问题已经解决,记录一下心酸的解决过程
1、安装过程(略)
可以参考各种百度google文档。
2、配置keepalived开机自启
[root@master1 keepalived-1.3.5] # cp /usr/local/src/keepalived-1.3.5/keepalived/etc/init.d/keepalived /etc/rc.d/init.d/
[root@master1 keepalived-1.3.5] # cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/
[root@master1 keepalived-1.3.5] # mkdir /etc/keepalived/
[root@master1 keepalived-1.3.5] # cp /usr/local/keepalived/etc/keepalived/keepalived.conf /etc/keepalived/
[root@master1 keepalived-1.3.5] # cp /usr/local/keepalived/sbin/keepalived /usr/sbin/
[root@master1 keepalived-1.3.5] # echo "/etc/init.d/keepalived start" >> /etc/rc.local
[root@master1 keepalived-1.3.5] # systemctl enable keepalived
[root@master1 keepalived-1.3.5] # systemctl start keepalived
3、启动报错了
提示:
[root@cqdsrmyy-app-01 keepalived]# /etc/init.d/keepalived start
Starting keepalived (via systemctl): Job for keepalived.service failed because a timeout was exceeded. See "systemctl status keepalived.service" and "journalctl -xe" for details.
[FAILED]
解决:
很多问题其实在日志里面已经说的很清楚了。只需要根据日志的提示区进行排查就可以了
这里提示的是一个PID找不到的问题。我们可以根据启动文件来查找
[root@cqdsrmyy-app-01 keepalived]# cat /usr/lib/systemd/system/keepalived.service
[Unit]
Description=LVS and VRRP High Availability Monitor
After=syslog.target network-online.target
[Service]
Type=forking
PIDFile=/usr/local/keepalived/var/run/keepalived.pid
KillMode=process
EnvironmentFile=-/usr/local/keepalived/etc/sysconfig/keepalived
ExecStart=/usr/local/keepalived/sbin/keepalived $KEEPALIVED_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
[Install]
WantedBy=multi-user.target
[root@cqdsrmyy-app-01 keepalived]#
这里制定了PID文件其实在服务器上面不存在,所以需要修改PIDFile=/var/run/keepalived.pid
保存然后重新启动keepalived就可以了
[root@cqdsrmyy-app-01 run]# systemctl start keepalived
[root@cqdsrmyy-app-01 run]# systemctl status keepalived
● keepalived.service - LVS and VRRP High Availability Monitor
Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2021-03-26 03:31:17 EDT; 8s ago
Process: 25043 ExecStart=/usr/local/keepalived/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 25044 (keepalived)
CGroup: /system.slice/keepalived.service
├─ 8983 nginx: master process /opt/nginx/sbin/nginx
├─ 8984 nginx: worker process
├─ 8985 nginx: worker process
├─ 8986 nginx: worker process
├─ 8987 nginx: worker process
├─ 8988 nginx: worker process
├─ 8989 nginx: worker process
├─ 8991 nginx: worker process
├─ 8992 nginx: worker process
├─25044 /usr/local/keepalived/sbin/keepalived -f /etc/keepalived/keepalived.conf -D -S 0
├─25045 /usr/local/keepalived/sbin/keepalived -f /etc/keepalived/keepalived.conf -D -S 0
└─25046 /usr/local/keepalived/sbin/keepalived -f /etc/keepalived/keepalived.conf -D -S 0
Mar 26 03:31:17 cqdsrmyy-app-01 Keepalived[25043]: Opening file '/etc/keepalived/keepalived.conf'.
Mar 26 03:31:17 cqdsrmyy-app-01 Keepalived[25044]: Starting Healthcheck child process, pid=25045
Mar 26 03:31:17 cqdsrmyy-app-01 Keepalived_healthcheckers[25045]: Initializing ipvs
Mar 26 03:31:17 cqdsrmyy-app-01 Keepalived[25044]: Starting VRRP child process, pid=25046
Mar 26 03:31:17 cqdsrmyy-app-01 Keepalived_healthcheckers[25045]: Opening file '/etc/keepalived/kee....
Mar 26 03:31:17 cqdsrmyy-app-01 Keepalived_vrrp[25046]: Registering Kernel netlink reflector
Mar 26 03:31:17 cqdsrmyy-app-01 Keepalived_vrrp[25046]: Registering Kernel netlink command channel
Mar 26 03:31:17 cqdsrmyy-app-01 Keepalived_vrrp[25046]: Registering gratuitous ARP shared channel
Mar 26 03:31:17 cqdsrmyy-app-01 Keepalived_vrrp[25046]: Opening file '/etc/keepalived/keepalived.conf'.
Mar 26 03:31:17 cqdsrmyy-app-01 systemd[1]: Started LVS and VRRP High Availability Monitor.
Hint: Some lines were ellipsized, use -l to show in full.
虽然启动问题是已经解决了,但是在测试的时候发现vip会在master和backup上面都存在。这个问题经过排查发现是因为keepalived.conf的配置问题导致的。
记录一下:
! Configuration File for keepalived
global_defs {
script_user root
enable_script_security
}
vrrp_script check_nginx {
script "/etc/keepalived/nginx_check.sh"
interval 10
}
vrrp_instance VI_1 { # 定义一个实例
state BACKUP # 指定Keepalived的角色,MASTER表示此主机是主服务器,BACKUP表示此主机是备用服务器,所以设置priority时要注意MASTER比BACKUP高。如果设置了nopreempt,那么state的这个值不起作用,主备靠priority决定。
nopreempt # 设置为不抢占
interface eth0 #指定监测网络的接口,当LVS接管时,将会把IP地址添加到该网卡上。
virtual_router_id 101 #虚拟路由标识,同一个vrrp实例使用唯一的标识,同一个vrrp_instance下,MASTER和BACKUP必须一致。
priority 100 #指定这个实例优先级
unicast_src_ip 192.168.1.14 # 配置单播的源地址
unicast_peer {
192.168.1.15 #配置单播的目标地址
} #keepalived在组播模式下所有的信息都会向224.0.0.18的组播地址发送,产生众多的无用信息,并且会产生干扰和冲突,可以将组播的模式改为单拨。这是一种安全的方法,避免局域网内有大量的keepalived造成虚拟路由id的冲突。
advert_int 1 #心跳报文发送间隔
authentication {
auth_type PASS #设置验证类型,主要有PASS和AH两种
auth_pass test123 #设置验证密码,同一个vrrp_instance下,MASTER和BACKUP的密码必须一致才能正常通信
}
virtual_ipaddress { #设置虚拟IP地址,可以设置多个虚拟IP地址,每行一个
118.24.101.16/24 dev eth1
}
track_interface { # 设置额外的监控,里面那个网卡出现问题都会切换
eth0
}
track_script {
check_nginx
}
}
重点在 virtual_router_id 虚拟路由标识,同一个vrrp实例使用唯一的标识,同一个vrrp_instance下,MASTER和BACKUP必须一致。因为把他理解成了mysql的主从了。配置的不一致的id导致他们在做vrrp通信是找不到对方,将其修改成一致之后问题解决。
重新测试实验,杀掉其中一台的nginx发现keepalive可以迅速拉活,因为我们在check_nginx.sh里面写了
同时杀掉keep alived和nginx killall nginx keepalived
发现vip迅速漂移到了另外一台服务器上面。至此keepalived就配置和测试完成了。