[Keepalived-devel] New Master stay the master

Discussion:

Paul Hirose

2010-09-17 00:47:24 UTC

keepalived 1.1.20 on RHEL 5.5 64bit. slb1 is the master w/priority 101, and slb2 is the backup w/priority 100.

When slb1 fails, all connections going through it die along with slb1. slb2 takes over and all subsequent new connections go through slb2. When slb1 is fixed and returned to service, it becomes the master again and takes over the VIP. Wouldn't this kill all connections using slb2? I'd like to prevent that from happening, and have slb2 simply remain the master, even though slb1 (with a higher priority) has come back up. I could manually edit the priority of slb1 down to 99 before bringing it up, I suppose but sometimes that isn't possible for whatever reason (unscheduled accidental reboot of slb1 for example.)

Is there a way to make a backup who becomes a master, to remain the master even when the original master comes back up?

Thank you,
PH
==
Paul Hirose

Stig Thormodsrud

2010-09-17 00:54:17 UTC

Permalink

Post by Paul Hirose
keepalived 1.1.20 on RHEL 5.5 64bit. slb1 is the master w/priority 101, and slb2 is the backup w/priority 100.
When slb1 fails, all connections going through it die along with slb1. slb2 takes over and all subsequent new connections go through slb2. When slb1 is fixed and returned to service, it becomes the master again and takes over the VIP. Wouldn't this kill all connections using slb2? I'd like to prevent that from happening, and have slb2 simply remain the master, even though slb1 (with a higher priority) has come back up. I could manually edit the priority of slb1 down to 99 before bringing it up, I suppose but sometimes that isn't possible for whatever reason (unscheduled accidental reboot of slb1 for example.)
Is there a way to make a backup who becomes a master, to remain the master even when the original master comes back up?

Have you tried adding "nopreempt" to the vrrp_instance?

Paul Hirose

2010-09-17 01:12:07 UTC

Permalink

Post by Stig Thormodsrud

Have you tried adding "nopreempt" to the vrrp_instance?

Ah, I missed that in the keepalived.conf man page, my apologies, and thank you.

The man pages state "the initial state of this entry must be BACKUP". I'm guessing the "this entry" is in reference to the vrrp_instance name {} section. The keepalived.conf.sample file doesn't have "state BACKUP", so I'm a bit confused, and maybe the note is refering to something else being the backup?

But if I understand correctly, I believe what I would then have on my slb1 master is:
vrrp_instance VI {
priority 101
state BACKUP
nopreempt
interface eth0
virtual_router_id 95
virtual_address {
1.2.3.4
}
}

And on my slb2 backup I would have:
vrrp_instance VI {
priority 100
state BACKUP
nopreempt
interface eth0
virtual_router_id 95
virtual_address {
1.2.3.4
}
}

So both slb1/2 are configured to be "state BACKUP". But when initially freshly booted, slb1 will nonetheless become the master anyway because of its higher priority and the fact that slb2 is not the master already (and in fact there are no masters at all.) Then when slb1 fails, slb2 will become the master, despite the nopreempt, because there's nothing for it to preempt and there is no active master at all. Then, when slb1 returns, because of the nopreempt, it will not take over the VIP even though it has a higher priority. Should slb2 (the now current master) fail, slb1 will thenl take over, even though it has the nopreempt directive, because now there's no master at all?

I'm not sure if the state BACKUP/nopreempt line is necessary on slb2. But I'd like to keep my configurations as similar as possible. The only thing different about them right now is the priority and the notification_email_from lines.

Thank you,
PH
==
Paul Hirose
University of California, Davis
***@ucdavis.edu

Bryan Talbot

2010-09-17 04:34:32 UTC

Permalink

Coincidentally, I've been trying to get 'nopreempt' to work as well,
but can't seem to get it working. The config is the same on both
servers except for the priority is changed to 100 on the other server.
I'm testing on CentOS 5.5 32 bit with keepalived 1.1.20.

vrrp_script chk_haproxy {
script "killall -0 haproxy"
interval 2
weight 2
}

vrrp_instance VI_1 {
state BACKUP
nopreempt
interface eth0
virtual_router_id 20
priority 101
virtual_ipaddress {
10.79.8.20
}
track_script {
chk_haproxy
}
}

To test the failover, I kill the haproxy instance on the current
master. The logs indicate that the check script fails, but no
failover ever occurs. If the keepalived process itself is killed, the
failover occurs as expected.

Log from MASTER after killing haproxy:
Sep 17 00:21:47 cl-t099-281cl Keepalived: Starting Keepalived v1.1.20
(09/15,2010)
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Registering Kernel
netlink reflector
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Registering Kernel
netlink command channel
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Registering gratutious
ARP shared channel
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Opening file
'/etc/keepalived/keepalived.conf'.
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Configuration is using
: 34460 Bytes
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: ------< Global
definitions >------
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Router ID = abc
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Smtp server = 127.0.0.1
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Smtp server connection
timeout = 30
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Email notification
from = ***@abc
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: ------< VRRP Topology >------
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: VRRP Instance = VI_1
Sep 17 00:21:47 cl-t099-281cl Keepalived: Starting VRRP child process, pid=15191
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Want State = BACKUP
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Runing on device = eth0
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Virtual Router ID = 20
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Priority = 101
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Advert interval = 1sec
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Preempt disabled
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Tracked scripts = 1
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: chk_haproxy weight 2
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Virtual IP = 1
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: 10.79.8.20/32 dev
eth0 scope global
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: ------< VRRP Scripts >------
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: VRRP Script = chk_haproxy
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Command = killall -0 haproxy
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Interval = 2 sec
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Weight = 2
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Rise = 1
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Full = 1
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Status = INIT
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: Using LinkWatch kernel
netlink reflector...
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: VRRP_Instance(VI_1)
Entering BACKUP STATE
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp: VRRP sockpool:
[ifindex(2), proto(112), fd(9,10)]
Sep 17 00:21:47 cl-t099-281cl Keepalived_vrrp:
VRRP_Script(chk_haproxy) succeeded
Sep 17 00:21:51 cl-t099-281cl Keepalived_vrrp: VRRP_Instance(VI_1)
Transition to MASTER STATE
Sep 17 00:21:52 cl-t099-281cl Keepalived_vrrp: VRRP_Instance(VI_1)
Entering MASTER STATE
Sep 17 00:21:52 cl-t099-281cl Keepalived_vrrp: VRRP_Instance(VI_1)
setting protocol VIPs.
Sep 17 00:21:52 cl-t099-281cl Keepalived_vrrp: VRRP_Instance(VI_1)
Sending gratuitous ARPs on eth0 for 10.79.8.20
Sep 17 00:21:57 cl-t099-281cl Keepalived_vrrp: VRRP_Instance(VI_1)
Sending gratuitous ARPs on eth0 for 10.79.8.20
Sep 17 00:22:07 cl-t099-281cl Keepalived_vrrp: VRRP_Script(chk_haproxy) failed

Log from BACKUP after killing haproxy on master:
Sep 17 00:21:52 cl-t099-291cl Keepalived_vrrp: Using LinkWatch kernel
netlink reflector...
Sep 17 00:21:52 cl-t099-291cl Keepalived_vrrp: VRRP_Instance(VI_1)
Entering BACKUP STATE
Sep 17 00:21:52 cl-t099-291cl Keepalived_vrrp: VRRP sockpool:
[ifindex(2), proto(112), fd(9,10)]
Sep 17 00:21:52 cl-t099-291cl Keepalived_vrrp:
VRRP_Script(chk_haproxy) succeeded

Brad Schick

2010-09-17 04:55:12 UTC

Permalink

Post by Bryan Talbot
Coincidentally, I've been trying to get 'nopreempt' to work as well,
but can't seem to get it working. The config is the same on both
servers except for the priority is changed to 100 on the other server.
I'm testing on CentOS 5.5 32 bit with keepalived 1.1.20.
vrrp_script chk_haproxy {
script "killall -0 haproxy"
interval 2
weight 2
}
vrrp_instance VI_1 {
state BACKUP
nopreempt
interface eth0
virtual_router_id 20
priority 101
virtual_ipaddress {
10.79.8.20
}
track_script {
chk_haproxy
}
}
To test the failover, I kill the haproxy instance on the current
master. The logs indicate that the check script fails, but no
failover ever occurs. If the keepalived process itself is killed, the
failover occurs as expected.

I think you are misunderstanding either nopreempt or priorities. nopreempt prevents a backup from taking over from another server when its priority becomes higher. A will still take over in a case of failure, however.

In your example, when haproxy goes away the priority simply goes down (or more correctly, stops going up). Since you have nopeempt set on both servers, however, nothing is going to happen as a result of the priority change.

What are you trying to accomplish with nopreempt?

-Brad

Bryan Talbot

2010-09-17 05:44:46 UTC

Permalink

I probably don't understand the priority and nopreempt inerations.
Please enlighten me!

My understanding was the nopreemt would prevent a node with a higher
priority from taking the master role away from a master with a lower
priority. That is why I was putting it in both vrrp_instance
sections.

What I'm trying to do is setup two servers so that neither one is an
outright master -- I don't care which one of the two is master as long
as it's exactly one of them (exactly one per vrrp group). I don't
want to experience two failovers events when the high-priority master
is down for maintenance for example.

If the vrrp_instance with the higher priority also has nopreemt set
and the vrrp_instance with the lower priority doesn't have it, then a
failover does occur when haproxy is killed. Failover back to the
instance with nopreempt when the lower priority vrrp_instance haproxy
is killed doesn't occur.

-Bryan

Post by Brad Schick

I think you are misunderstanding either nopreempt or priorities. nopreempt prevents a backup from taking over from another server when its priority becomes higher. A will still take over in a case of failure, however.
In your example, when haproxy goes away the priority simply goes down (or more correctly, stops going up). Since you have nopeempt set on both servers, however, nothing is going to happen as a result of the priority change.
What are you trying to accomplish with nopreempt?
-Brad

Brad Schick

2010-09-17 06:26:08 UTC

Permalink

Post by Bryan Talbot
I don't
want to experience two failovers events when the high-priority master
is down for maintenance for example.

If that is the primary goal (rather than trying to avoid flapping), then I think you could just set them both to be backups with the same priority. Then just start the maint with the current backup server. Then perform maint. on the current master. Keepalived will failover to the backup, but it should not switch back since the default priority is equal.

Then if haproxy fails on either instance the failover will also happen. It get's a bit more tricky if you also want to prevent multiple automatic failovers (flapping) between the instances.

Post by Bryan Talbot
If the vrrp_instance with the higher priority also has nopreemt set
and the vrrp_instance with the lower priority doesn't have it, then a
failover does occur when haproxy is killed. Failover back to the
instance with nopreempt when the lower priority vrrp_instance haproxy
is killed doesn't occur.

Correct, you get that behavior by setting nopreempt on the higher priority instance only. The problem is that when running on the lower priority instance (without nopreempt set), you won't get a failover back to the higher priority instance that has nopreempt set even when haproxy goes away (since that only adjust's priority in your setup). You'd only get a failover back to the original master if the backup instance goes offline entirely.

-Brad

Bryan Talbot

2010-09-17 17:54:01 UTC

Permalink

Post by Brad Schick

Post by Bryan Talbot
I don't
want to experience two failovers events when the high-priority master
is down for maintenance for example.

If that is the primary goal (rather than trying to avoid flapping), then I think you could just set them both to be backups with the same priority. Then just start the maint with the current backup server. Then perform maint. on the current master. Keepalived will failover to the backup, but it should not switch back since the default priority is equal.
Then if haproxy fails on either instance the failover will also happen. It get's a bit more tricky if you also want to prevent multiple automatic failovers (flapping) between the instances.

What's required to avoid flapping and still have automatic failover on
loss of service and no automatic fail back to a specific master when
service is restored? Is that even possible with just 2 hosts? Seems
like a quorum based system would be needed to do that properly.

-Bryan

Martin Barry

2010-09-30 14:31:15 UTC

Permalink

$quoted_author = "Bryan Talbot" ;

Post by Bryan Talbot
What's required to avoid flapping and still have automatic failover on
loss of service and no automatic fail back to a specific master when
service is restored? Is that even possible with just 2 hosts? Seems
like a quorum based system would be needed to do that properly.

Something I thought of but never got around to testing was to use the
floating VRRP priority and have two track_script entries, one that checked
the load balancing service and the other that checked whether the server was
MASTER.

The configuration files would have both start as BACKUP but the preferred
node with a slightly higher priority (so they never deadlock).

check_load_balancer would adjust the priority by a large amount (e.g. 50)
check_master would adjust the priority by a medium amount (e.g. 10)

Lets just walk through an example.

At start:
Node Priority State
1 101 BACKUP
2 100 BACKUP

First election won by node 1:
Node Priority State
1 101 MASTER
2 100 BACKUP

check_master succeeds on node 1:
Node Priority State
1 111 MASTER
2 100 BACKUP

check_load_balancer fails on node 1:
Node Priority State
1 61 MASTER
2 100 BACKUP

Next election won by node 2:
Node Priority State
1 61 BACKUP
2 100 MASTER

check_master succeeds on node 2 and fails on node 1:
Node Priority State
1 51 BACKUP
2 110 MASTER

check_load_balancer recovers on node 1:
Node Priority State
1 101 BACKUP
2 110 MASTER

Note that the final state is stable until node 2 either fails completely or
check_load_balancer fails. check_master increasing the priority means we
don't need nopreempt.

Anyone want to test my theory? :-)

cheers
Marty

Brad Schick

2010-09-17 06:05:06 UTC

Permalink

Post by Paul Hirose

Post by Stig Thormodsrud

Have you tried adding "nopreempt" to the vrrp_instance?

Ah, I missed that in the keepalived.conf man page, my apologies, and thank you.
The man pages state "the initial state of this entry must be BACKUP". I'm guessing the "this entry" is in reference to the vrrp_instance name {} section. The keepalived.conf.sample file doesn't have "state BACKUP", so I'm a bit confused, and maybe the note is refering to something else being the backup?
vrrp_instance VI {
priority 101
state BACKUP
nopreempt
interface eth0
virtual_router_id 95
virtual_address {
1.2.3.4
}
}
vrrp_instance VI {
priority 100
state BACKUP
nopreempt
interface eth0
virtual_router_id 95
virtual_address {
1.2.3.4
}
}
So both slb1/2 are configured to be "state BACKUP". But when initially freshly booted, slb1 will nonetheless become the master anyway because of its higher priority and the fact that slb2 is not the master already (and in fact there are no masters at all.) Then when slb1 fails, slb2 will become the master, despite the nopreempt, because there's nothing for it to preempt and there is no active master at all. Then, when slb1 returns, because of the nopreempt, it will not take over the VIP even though it has a higher priority. Should slb2 (the now current master) fail, slb1 will thenl take over, even though it has the nopreempt directive, because now there's no master at all?

That sounds correct. One problem with that setup is that you need to bring the machines up in the correct order. With nopreempt set, you don't have an easy way to force one to become the master short of manually failing the current master.

Here is one solution that I use: Set both instance to BACKUP, with the same priority, and do not use "nopreempt". Then on your intended master, track one script that reduce its priority for a failure and another that boosts its priority based on some manual action you take (like creating a file). The backup instance gets neither script. For example:

vrrp_script take_over {
# check for the existance of /tmp/vrrp_takeover, and take over if it's present
script "rm /tmp/vrrp_takeover 2> /dev/null"
interval 30 # check every 30 seconds
weight 5 # add 5 points of prio if present (less then chk_whatever)
}

vrrp_script chk_whatever {
script "killall -0 whatever"
interval 2 # check every 10 seconds
weight -10 # subtract 10 points of prio if KO
}

vrrp_instance vip_whatever {
# This is the intended MASTER, but we want explicit takeover back to the
# master so make it a BACKUP and use priority boosts through /tmp/vrrp_takeover
state BACKUP
virtual_router_id 50
priority 50
track_script {
chk_mysql
take_over
}
virtual_ipaddress {
192.168.0.10/24
}
}

vrrp_instance vip_whatever {
# This is the intended BACKUP, no failure checks because we don't automatically
# switch back to the intended master except manually
state BACKUP
virtual_router_id 50
priority 50
virtual_ipaddress {
192.168.0.10/24
}
}

As in your nopreempt example, this still have the potential issue that the intended "master" fails and restarts it will take over from the backup automatically if the backup fails entirely. You'd have to add a notify script or something to force the master into a fail state to prevent that from happening. And have it restart into a fail state.

Anyway, I think keepalived would benefit from the ability to add "preempt" and perhaps "nopreempt" to vrrp_scripts. Then you could use nopreempt on the intended master rather than equal priorities. With the feature, the track script could be something like:

vrrp_script take_over {
script "rm /tmp/vrrp_takeover 2> /dev/null"
interval 30
weight 5
preempt
}

Maybe I'll work on a patch someday.

-Brad

Brad Schick

2010-09-17 06:34:22 UTC

Permalink

Once thing I should mention about this... I'd assume that in rare cases it could be possible for instances of equal priority to start up in a manner that caused both to see that another instance of equal priority is trying to take over and both of them end up going down. Hopefully they wouldn't stay perfect synchronized for long and that condition would soon end, but I'd be curious to hear what others who know more think about that.

-Brad

Paul Hirose

2010-09-17 06:21:40 UTC

Permalink

Post by Brad Schick

Post by Paul Hirose

Post by Stig Thormodsrud

Have you tried adding "nopreempt" to the vrrp_instance?

Ah, I missed that in the keepalived.conf man page, my apologies, and thank you.
The man pages state "the initial state of this entry must be BACKUP". I'm guessing the "this entry" is in reference to the vrrp_instance name {} section. The keepalived.conf.sample file doesn't have "state BACKUP", so I'm a bit confused, and maybe the note is refering to something else being the backup?
vrrp_instance VI {
priority 101
state BACKUP
nopreempt
interface eth0
virtual_router_id 95
virtual_address {
1.2.3.4
}
}
vrrp_instance VI {
priority 100
state BACKUP
nopreempt
interface eth0
virtual_router_id 95
virtual_address {
1.2.3.4
}
}
So both slb1/2 are configured to be "state BACKUP". But when initially freshly booted, slb1 will nonetheless become the master anyway because of its higher priority and the fact that slb2 is not the master already (and in fact there are no masters at all.) Then when slb1 fails, slb2 will become the master, despite the nopreempt, because there's nothing for it to preempt and there is no active master at all. Then, when slb1 returns, because of the nopreempt, it will not take over the VIP even though it has a higher priority. Should slb2 (the now current master) fail, slb1 will thenl take over, even though it has the nopreempt directive, because now there's no master at all?

So only slb1 (the one w/higher priority in the conf file) should have nopreempt? Would that be simpler? The way I thought it worked (if nopreempt/state BACKUP is in both config files) is as follows. Say they both reboot fresh. Whichever boots up faster (even by a tiny margin) ends up being the master. Why: because at that exact moment, there is NO master at all for virtual_router_id #95. Is that correct? Regardless of preempt/nopreempt/state, if there is no master at all, the first available system will become the master and take the VIP.

Then, even if it's a tiny margin of time later, the other machine finishes booting, and sees there is already a master. If this slower box is the one w/lower priority, that's fine and it just sits idle like it should. If however, this slower box is the higher priority machine, it obeys the nopreempt directive, and thus sits idle and does not take over the VIP because it sees there is already a master for virtual_router_id #95.

Now one machine goes down. Say it's the currently idle system, then no harm. If instead the currently acting master goes down, the other remaining box, regardless of whether it has a higher or lower priority than the machine that died, sees there is no master at all, and thus takes over the VIP.

Then, when the machine that died comes back up, it sees there is a master and obeys its nopreempt directive, regardless of its own priority.

The cycle is then complete.

Am I missing a case/scenario where a machine will not take over a VIP and become the master when the VIP doesn't exist anywhere? Or conversely am I missing a case where a machine will take over the VIP from another machine?

Because both machines are state BACKUP to begin with, it might take a bit longer for it to realize there is no master perhaps? state BACKUP doesn't mean that it never bothers to check whether a master exists at all, right? I don't mind if there's an extra second or two of delay before the "backup" machine realizes there's no master at all and it needs to promote itself to master status and activate the VIP. As long as that delay isn't too large :)

Thank you,
PH

==
Paul Hirose

Brad Schick

2010-09-17 16:11:25 UTC

Permalink

Post by Paul Hirose
So only slb1 (the one w/higher priority in the conf file) should have nopreempt? Would that be simpler? The way I thought it worked (if nopreempt/state BACKUP is in both config files) is as follows. Say they both reboot fresh. Whichever boots up faster (even by a tiny margin) ends up being the master. Why: because at that exact moment, there is NO master at all for virtual_router_id #95. Is that correct? Regardless of preempt/nopreempt/state, if there is no master at all, the first available system will become the master and take the VIP.
Then, even if it's a tiny margin of time later, the other machine finishes booting, and sees there is already a master. If this slower box is the one w/lower priority, that's fine and it just sits idle like it should. If however, this slower box is the higher priority machine, it obeys the nopreempt directive, and thus sits idle and does not take over the VIP because it sees there is already a master for virtual_router_id #95.
Now one machine goes down. Say it's the currently idle system, then no harm. If instead the currently acting master goes down, the other remaining box, regardless of whether it has a higher or lower priority than the machine that died, sees there is no master at all, and thus takes over the VIP.
Then, when the machine that died comes back up, it sees there is a master and obeys its nopreempt directive, regardless of its own priority.
The cycle is then complete.
Am I missing a case/scenario where a machine will not take over a VIP and become the master when the VIP doesn't exist anywhere? Or conversely am I missing a case where a machine will take over the VIP from another machine?

Sounds correct, just be aware that with nopreempt set slb1 will only take over if slb2 fails or doesn't exist. Simply reducing slb1 priority (as in the related thread about haproxy) would not cause the switch.

Post by Paul Hirose
Because both machines are state BACKUP to begin with, it might take a bit longer for it to realize there is no master perhaps? state BACKUP doesn't mean that it never bothers to check whether a master exists at all, right? I don't mind if there's an extra second or two of delay before the "backup" machine realizes there's no master at all and it needs to promote itself to master status and activate the VIP. As long as that delay isn't too large :)

I think that depends on the advert_int rate not the starting state of the instance.

-Brad