> > > [1] At cluster bring-up time. The resource manager has to be able > > > to operate without reading files during failover. > > > > Existing CRMs will not do this, at least, not the ones I've looked > > at. lrmd (new linux-ha stuff) and heartbeat (older) and rgmanager all > > fork/ exec scripts on the local file system to control resources and > > applications, implying that none will work with csnap server. > > > > D'oh! :( > > This problem is not specific to the csnap server, it's a problem for > anything that lies in the failover path and wants resource management. > You could say that this problem is specific to servers, because they > tend to attract a lot of network and disk traffic to themselves. No. being a block device really does make it a special problem. You can suck up a lot more memory that can't be swapped out once the client is below the VM layer. When a NFS server goes away, the clients do not deadlock themselves, but they are above the VM layer. The other device that could have this problem is the cluster mirror device, but Jon is taking a completely different approach. I think the results will be very instructive. Here's what I mean. GNBD was developed before we had a resource manager, and before service manager was stable. GULM doesn't provide all the capabilities that SM does, so I decided that the GNBD clients would do failure detection themselves. This more than doubled the complexity of the gnbd code, and caused our support department to fevorishly beg that this solution never be used again. I am currently working on getting the csnap device to plug into rgmanager for failover. This will possibly have some cases where a client could have detected an error, and rgmanager will miss it. I don't think they will be major. It is also a much simpler approach. On the downside, we've added a lot of userspace overhead to the failover process. AFAIK, Jon is currently planning on having the cluster mirror device completely in kernel, and relying on SM for failure detection. This is a very lightweight approach, which I think is about as resistant as the linux kernel will allow us to be to memory pressure. On the downside, I believe that Service Manager's only failure detection mechanism is an unexpected socket closing. This leaves obvious cases where rgmanager could detect failure and SM will miss it. It also forces the cluster mirror code to do more self checking than the csnap server needs to do. If the cluster mirror is robust enough, it will win hands down on simplicity. The server will be in kernel, but the design could be used with a userspace server. If the design is not robust enough, then some variety of the csnap design may me the best compromise between robustness and simplicity. Right now, I wouldn't be willing to bet on one or the other, but I'm interested in seeing how this plays out. -Ben > You could go on to say "but if the writeout path didn't have any servers > in it, I wouldn't have to do any resource managemen!" That's correct > except it's a lot easier to get some acceptable form of resource > management working than to distribute the on-disk data structures > involved. > > I think the biggest part of this problem is just defining what's needed. > How hard could it be to implement? > > Oh, is the resource manager to be distributed, or will it be a > server? ;-) > > Regards, > > Daniel > > -- > > Linux-cluster@xxxxxxxxxx > http://www.redhat.com/mailman/listinfo/linux-cluster