On Thu, 2004-10-07 at 13:58 -0400, Daniel Phillips wrote: > Suppose that the winner of the race to get the exclusive lock is a bad > choice to run the server. Perhaps it has a fast connection to the net > but is connected to the disk over the network instead of directly like > the other nodes. How do you fix that, within this model? Let me see if I am getting this use-case picture right, first of all. (Csnap[letter] are potential csnap servers; Clients are csnap clients when Csnap[letter] is the csnap master server) iSCSI | <---Gig-E---> CsnapA <--100Mbit--> Clients | <--100Mbit--> CsnapB <---Gig-E---> Clients Or... SAN | <-- FC --> iSCSI/GNBD Export | ^--100Mbit--> CsnapA <-Gig-E-> Clients | +-> CsnapB <-Gig-E-> Clients | +-> CsnapC <-Gig-E-> Clients | | <-- FC --> CsnapD <------ 100Mbit ------> Clients | <-- FC --> CsnapE <------ 100Mbit ------> Clients Are either of those close? (1) Don't set up your csnap server in such a way that some the nodes exhibit a bottleneck on disk I/O and some do not. Given the potentially high performance requirements which may be needed by the cluster snapshot server and assuming predictable performance is, in fact, a consideration (if not "important"), please explain why [1] is an unreasonable constraint This obviously is not necessarily related to picking one failover model over another; I'm just trying to understand precisely why this use case is so interesting. (2) Have the administrator make an intelligent decision as to whether or not to relocate the csnap master server again as [s]he tries to fix the problem that caused the failover. I.E. Don't worry about it if the csnap master server is running slowly. Your clients still work, and the csnap server is available, albeit at a potentially degraded state. (3) Don't use the cluster-lock model. It has its shortcomings. Its strengths are in its simplicity; not its flexibility. -- Lon