[Keepalived-devel] vrrp_script, notify_fault, backup/fault state

Discussion:

Paul Hirose

2009-11-25 19:58:25 UTC

Keepalived 1.1.19 running on CentOS 5.4 using HAProxy 1.3.22 as the service traffic load balancer. This much works fine :) I have two LBs both running KA, and it fails over wonderfully when I pull the plug on one LB or the other.

I was trying to use the vrrp_script to watch the haproxy software running on the active/master load-balancer:
vrrp_script chk_haproxy { # Requires keepalived-1.1.13
script "killall -0 haproxy" # cheaper than pidof
interval 2 # check every 2 seconds
}
track_script {
chk_haproxy
}
I found this works fine as long as haproxy is running on the master.

When I manually kill haproxy to test it, Keepalived switches to the FAULT state. I thought (incorrectly I guess?) this would tell the LB in the BACKUP state to switch to the MASTER state (like it would, if the first LB actually went down.) But it doesn't. The LB in BACKUP state, stays there. So now I have one Keepalived system on the Backup state and another in the Fault state, and none in the Master state.

First, is this correct and expected? Or am I missing something that should have made the Backup go into the Master state, when the current Master goes into the Fault state?

I also use the following:
notify_master /opt/keepalived/bin/notify_master.sh
notify_backup /opt/keepalived/bin/notify_backup.sh
notify_fault /opt/keepalived/bin/notify_backup.sh
The notify_master.sh script just runs haproxy. So when a system goes from whatever state to Master state, it should start the haproxy process. This works :) When a system goes from whatever state to the Backup state, it kills the haproxy process. I wasn't sure what Fault state really was (I'm still not), so I run the same script when a system goes into Fault state, and kill the haproxy.

Is the only way a system goes into a Fault state when the vrrp_script fails? The only way that script fails is if haproxy isn't running. So if that causes the system to enter the Fault state, I could create a notify_fault.sh which would basically be the same as notify_master.sh (eg: it would (re)execute the haproxy program.) I'd then have to modify notify_master.sh to check for the existence of haproxy before trying to run it (probably a good idea anyway).

I know I could eliminate all this by simply not running HAProxy and simply using the full ability of Keepalived (I'm essentially only using it in VRRP mode, since I have no virtual_servers defined.) I just found haproxy easier to get going than the virtual_server part of the Keepalived stuff (not necessarily because of Keepalived itself, but because of the whole NAT/DR/TUN type stuff and I'm not sure how my network is laid-out.) But my network layout, I think, is still one single network, all using real/routable IP addresses, all with single interfaces, etc. I tried going through the LVS-DR document about using only one interface, but that didn't work out :)

If anyone can help me out with this whole fault/backup/master state stuff and transitioning between them, and/or how to get an existing backup-state system to go to master when the current master goes into fault, etc, I'd greatly appreciate it.

Thank you,
PH

Paul Hirose : ***@ucdavis.edu : Sysadm Motto: rm -fr /MyLife

Graeme Fowler

2009-11-25 20:09:08 UTC

Permalink

Post by Paul Hirose
Keepalived 1.1.19 running on CentOS 5.4 using HAProxy 1.3.22 as the service traffic load balancer. This much works fine :) I have two LBs both running KA, and it fails over wonderfully when I pull the plug on one LB or the other.
vrrp_script chk_haproxy { # Requires keepalived-1.1.13
script "killall -0 haproxy" # cheaper than pidof
interval 2 # check every 2 seconds
}
track_script {
chk_haproxy
}
I found this works fine as long as haproxy is running on the master.

Try adding a "weight" clause to the vrrp_script stanza, like so:

vrrp_script check_cgp {
script "/usr/local/bin/check_cgp"
interval 10
weight 10
}

Then in the track_script stanza:

track_script {
check_cgp weight 20
}

That's what we do here, and it works perfectly. The VRRP prio gets
adjusted according to the weights on test pass/fail, and the backup then
switches to MASTER.

Graeme

Paul Hirose

2009-11-25 21:30:04 UTC

Permalink

Post by Graeme Fowler

vrrp_script check_cgp {
script "/usr/local/bin/check_cgp"
interval 10
weight 10
}
track_script {
check_cgp weight 20
}
That's what we do here, and it works perfectly. The VRRP prio gets
adjusted according to the weights on test pass/fail, and the backup then
switches to MASTER.

I made the change and now my Master won't even transition into the Fault state anymore. Incidentally, I did do a iptables -I RH-Firewall-1-INPUT 9 -p vrrp -j ACCEPT on both the master and the backup load balancer. If I physically pull the network cable out of the lb1 (the master), then lb2 (the backup) does go to Master status. And if I plug lb1's cable back in, it goes back to Master status (not sure if it ever changed actual states while I had the cable pulled) and lb2 switches to Backup status. But just killing keepalived on lb1 doesn't make lb2 become the Master status. Are there more ports I need to open on the local host-based firewall other than for protocol vrrp (-p vrrp)?

The /var/log/messages on the Master line at 13:03:38 is when I did a kill -TERM haproxy from a different window. And the line at 13:18:18 is when I restarted haproxy manually.

The /var/log/messages on the Master
===================================
Nov 25 13:02:58 lbtest1 Keepalived: Starting VRRP child process, pid=29839
Nov 25 13:02:58 lbtest1 Keepalived_vrrp: Opening file '/opt/keepalived/etc/keepalived.conf'.
Nov 25 13:02:58 lbtest1 Keepalived_vrrp: Configuration is using : 35162 Bytes
Nov 25 13:02:58 lbtest1 Keepalived_vrrp: Using LinkWatch kernel netlink reflector...
Nov 25 13:02:58 lbtest1 Keepalived_vrrp: VRRP sockpool: [ifindex(2), proto(112), fd(9,10)]
Nov 25 13:02:59 lbtest1 Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Nov 25 13:03:00 lbtest1 Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE
Nov 25 13:03:00 lbtest1 Keepalived_vrrp: VRRP_Instance(VI_1) setting protocol VIPs.
Nov 25 13:03:00 lbtest1 Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 128.120.33.211
Nov 25 13:03:00 lbtest1 avahi-daemon[2136]: Registering new address record for 128.120.33.211 on eth0.
Nov 25 13:03:05 lbtest1 Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 128.120.33.211
Nov 25 13:03:08 lbtest1 Keepalived_vrrp: VRRP_Script(chk_haproxy) succeeded
Nov 25 13:03:38 lbtest1 Keepalived_vrrp: VRRP_Script(chk_haproxy) failed
Nov 25 13:18:18 lbtest1 Keepalived_vrrp: VRRP_Script(chk_haproxy) succeeded

The /var/log/messages on the Backup
===================================
Nov 25 13:04:05 lbtest2 Keepalived: Starting Keepalived v1.1.19 (11/20,2009)
Nov 25 13:04:05 lbtest2 Keepalived_vrrp: Registering Kernel netlink reflector
Nov 25 13:04:05 lbtest2 Keepalived_vrrp: Registering Kernel netlink command channel
Nov 25 13:04:05 lbtest2 Keepalived_vrrp: Registering gratutious ARP shared channel
Nov 25 13:04:05 lbtest2 Keepalived_vrrp: Opening file '/opt/keepalived/etc/keepalived.conf'.
Nov 25 13:04:05 lbtest2 Keepalived_vrrp: Configuration is using : 35160 Bytes
Nov 25 13:04:05 lbtest2 Keepalived_vrrp: Using LinkWatch kernel netlink reflector...
Nov 25 13:04:05 lbtest2 Keepalived_vrrp: VRRP_Instance(VI_1) Entering BACKUP STATE
Nov 25 13:04:05 lbtest2 Keepalived_vrrp: VRRP sockpool: [ifindex(2), proto(112), fd(9,10)]
Nov 25 13:04:05 lbtest2 Keepalived: Starting VRRP child process, pid=12686

And my keeplalived.conf on the Master
=====================================
vrrp_script chk_haproxy {
script "killall -0 haproxy"
interval 10
weight 10
}
vrrp_instance VI_1 {
interface eth0
state MASTER
virtual_router_id 51
priority 100
virtual_ipaddress {
A.B.C.1
}
notify_master /opt/keepalived/bin/notify_master.sh
notify_backup /opt/keepalived/bin/notify_backup.sh
notify_fault /opt/keepalived/bin/notify_backup.sh
track_script {
chk_haproxy weight 20
}
}

And my keeplalived.conf on the Backup
=====================================
vrrp_script chk_haproxy {
script "killall -0 haproxy"
interval 10
weight 10
}
vrrp_instance VI_1 {
interface eth0
state BACKUP
virtual_router_id 51
priority 90
virtual_ipaddress {
A.B.C.2
}
notify_master /opt/keepalived/bin/notify_master.sh
notify_backup /opt/keepalived/bin/notify_backup.sh
notify_fault /opt/keepalived/bin/notify_backup.sh
track_script {
chk_haproxy weight 20
}
}

The notify_master.sh script on both Master and Backup
=====================================================
#!/bin/sh
/opt/haproxy/sbin/haproxy -f /opt/haproxy/etc/haproxy.cfg &
exit 0

The notify_backup.sh script on both Master and Backup
=====================================================
#!/bin/sh
/usr/bin/killall -TERM haproxy
exit 0

PH

Paul Hirose : ***@ucdavis.edu : Sysadm Motto: rm -fr /MyLife