Re: failover scenario's for replication

Paul Dekkers <Paul.Dekkers@xxxxxxxxxx> · Tue, 29 Aug 2006 10:11:42 +0200

Hi,

Bron Gondwana wrote:
> On Mon, Aug 28, 2006 at 02:23:22PM -0400, Wesley Craig wrote:
>   
>> On 26 Aug 2006, at 16:09, Paul Dekkers wrote:
>>     
>>> Right now, it looks tricky to me to enable replication after failover,
>>> or the replicated machine itself if you're not sure that the  
>>> replica is
>>> identical and the sync-processes finished completely: if a message- 
>>> file
>>> is in place on machine A (say "7.") but it was not replicated to  
>>> machine
>>> B while that one becomes the master, the machine B will create a new
>>> file 7. and both machines consider this file synchronised after that:
>>> also if roles switch back, you have two different (with one isolated)
>>> copies of 7.
>>>       
>> As I understand it, this is what replication uuids are for.  Not that  
>> I've experimented with this particular case.
>>     
>
> All that replication UUIDs do is make sure that the copy of '7' on
> the master overwrites the copy of '7' on the replica.  It doesn't make
> any attempt to retain '7' from the replica.
>   

It doesn't replace it either. As soon as I have a different copy of '7'
on my replica it never gets rewritten, for as far as I can see with this
simple experiment:

'Haver' is master, 'gerst' is replica.

[root@gerst paul]# md5sum 7.
599307e354e203b706a7ba88d6ad668c  7.
[root@gerst paul]# md5sum 11.
md5sum: 11.: No such file or directory

[root@haver paul]# sudo -u cyrus /usr/lib/cyrus-imapd/sync_client -v -u paul
USER paul
[root@haver paul]# md5sum 7.
32187646fe6176e989b9b59c59f7af9e  7.
[root@haver paul]# md5sum 11.
41d62ed42df1f058a76831061fb0c4ca  11.

[root@gerst paul]# md5sum 7.
599307e354e203b706a7ba88d6ad668c  7.
[root@gerst paul]# md5sum 11.
41d62ed42df1f058a76831061fb0c4ca  11.

Still the previous '7', the new '11' gets replicated correctly. I don't
know if this is correct behavior or not, to be honest.

(From one perspective I think this is a good thing: if there is a
different message(-id) on the other host, no matter what number it has,
it should remain there, a new message should be added and maybe even the
extra message on the replica should get replicated back to the master.
On the other hand: this causes inconsistencies and this is no
bidirectional synchronization, the master should be right (unless there
was a failover ;-)) so just replace the thing (hmm, have to think that
over, still sounds a bit scary and I've had inconsistent 'but still
running' filesystems before...))

>>> Or is it only preferred to use a replica if there is a really serious
>>> crash on the (previous) master?
>>>       
>> That's certainly how I view the current system.  Until replication is  
>> more reliable, I'd be quite leery of any sort of automatic failover.
>>     
>
> Ditto.  Our 'init scripts' actually check a database table to see which
> role a particular instance on a machine has and then starts up in that
> mode.  Changing over the database table entry is a manual step.
>
> The master init script also attempts to run the remaining log files with
> sync client if there are any.

Hmm, sounds like a good idea. (Allthough you can't do that indeed with
an unreachable replica ;-) For now I can live with that, maybe I'd just
put some other check in front to see if the replica is available or not,
check_tcp from Nagios or something.)

>   Sadly, sync_client doesn't interact well
> with real-time requriements and the replica being away.  Bah.  I'll
> get back to my "-o" => "only try to connect ONCE" patch one day.
>
>   
>>> It sounds nice to me if I could use heartbeat or (u)carp (/ifstated)
>>> like systems to start and stop a sync_client or sync_server copy of
>>> cyrus (both different cyrus.conf) as soon as the state of the virtual
>>> interface changes, but then it is even more likely that some  
>>> replication
>>> process is not finished without an admin even noticing it.
>>>       
>> I agree, this is a great goal.  I'd be interested in seeing a roadmap  
>> for how to achieve it, including how failback would occur.  There's a  
>> lot of opportunity to share operational experience with Cyrus.  If  
>> only there was a forum to publish such information...
>>     
>
> Yeah, I've had a play with using heartbeat.  The downside is that its
> colocation works, but ordering operations without having dependencies
> take the other side down as well doesn't work properly.  You can't say
> "always start the master in preference" and "start the replica first
> if you can" (makes master startup actually work at the moment!).
>   

I haven't tried this; but does it hurt defining sync_server, imapd and
friends processes in the replicas cyrus.conf and by that have it
identical as the master?
If we're not using it while in replica mode I'm curious if it will hurt
(same for the sync_server on the master). Then the only switch you'd
make between being a replica or master is the sync_client, which is
something we currently take out of control of the cyrus.conf anyway ;-)

(I'm still thinking of nice ways to control (/automatically restart) the
sync_client; It doesn't write out a pid, daemon (on FreeBSD) doesn't
create the right pidfile for the thing, so things like monit or the
restartwrapper fail to control the thing... It doesn't stay in
foreground while in rolling mode... Maybe just have check_procs from
Nagios look at the process-string (or any other thing that looks in
/proc or ps). What do others do? Wesley's suggestion for having a
seperate init-script in the same runlevel still looks 'manual' to me,
and/or that's not the part that generates an alert.
Maybe I'd write a patch for staying in foreground and/or writing out a
pidfile ;-))

Paul

----
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html