On Fri, 2004-10-08 at 17:25 -0400, Daniel Phillips wrote: > Node 3 won the race to get the EX lock because the lock is mediated over > the GigE network. But Node 3 is a bad choice because it is two hops > away from the disk. The DLM chose Node 3 because the DLM doesn't know > anything about network topology, just who got there first to grab the > lock. Ok, got it! > > (1) Don't set up your csnap server in such a way that some the nodes > > exhibit a bottleneck on disk I/O and some do not. > > But what prevents it? How do you "set up your csnap server"? Nothing prevents it. Sorry, "csnap server cluster". > Why would you want to introduce new rules about cluster topology instead of > fixing the code? It doesn't introduce a hard rule, it will still work, and it's very, very simple. The only rule it introduces is something like this: "For equivalent csnap server performance from all nodes, set your cluster up with equivalent performing nodes and connections." Again, it was an answer to your question about how to make the single- cluster-lock model work. > > (2) Have the administrator make an intelligent decision as to whether > > or not to relocate the csnap master server again as [s]he tries to > > fix the problem that caused the failover. I.E. Don't worry about it > > if the csnap master server is running slowly. > > The administrator is normally asleep or busy with girlfriend when > anything goes wrong. Fair enough. Admittedly, it's an inconvenience. Luckily, it's rare... Though still not a good thing. You can use rgmanager to provide priority levels of nodes (e.g. highest priority == fastest to disk), but... (see way below) > > Your clients still work, and the csnap server is available, albeit at > > a potentially degraded state. > > Well... Out of curiosity, in the configuration above, how much of an actual performance bottleneck do you expect to incur, given that (I think) the csnap writes are synchronous? > > (3) Don't use the cluster-lock model. It has its shortcomings. Its > > strengths are in its simplicity; not its flexibility. > Yes, that's the one. We need real resource management, even if it > initially just consists of an administrator setting up config files. > Something has to read those config files[1] and respond to server > instance requests from csnap agents accordingly. Heh :) > [1] At cluster bring-up time. The resource manager has to be able to > operate without reading files during failover. Existing CRMs will not do this, at least, not the ones I've looked at. lrmd (new linux-ha stuff) and heartbeat (older) and rgmanager all fork/ exec scripts on the local file system to control resources and applications, implying that none will work with csnap server. D'oh! :( -- Lon