On Fri, 2004-10-08 at 17:49, Daniel Phillips wrote: > On Thursday 07 October 2004 18:36, Daniel McNeil wrote: > > On Thu, 2004-10-07 at 10:58, Daniel Phillips wrote: > > > On Thursday 07 October 2004 12:08, Lon Hohberger wrote: > > > > On Thu, 2004-10-07 at 02:07, David Teigland wrote: > > > > > Using DLM locks is a third way you might > > > > > solve this problem without using SM or RM; I don't understand > > > > > the details of how that might work yet but it sounds > > > > > interesting. > > > > > > > > It's primarily because it assumes that the DLM or GuLM can ensure > > > > that only one exclusive lock is granted at a time. Because of > > > > this, the holder of the lock would thereby become the csnap > > > > master server, and the old master will either have been fenced or > > > > relinquished its duties willfully (and thus is no longer a > > > > threat). > > > > > > Suppose that the winner of the race to get the exclusive lock is a > > > bad choice to run the server. Perhaps it has a fast connection to > > > the net but is connected to the disk over the network instead of > > > directly like the other nodes. How do you fix that, within this > > > model? > > > > Good question. Another good question is how would a resource > > manager know to pick the "best" choice? > > > > It would seem to me that the csnap-server is the best one > > to know if this node is a good choice or not. > > > > I can think of a few of ways of handling this: > > > > 1. If this node is not a good choice to run csnap-server, > > do not run it at all. If this node is not directly > > connected to the disk and is using the net to some > > other node, that other node has to be running, so that node > > can be the csnap-server. > > > > 2. Use 2 dlm locks. 1 for "better" choices (direct connected, > > faster connected), and 1 for "other" choices. The "better" > > csnap-servers go for "better" lock exclusive while the "other" > > csnap-servers go the "better" lock for read and the "other" > > lock exclusive. If a csnap-server gets the "better" lock > > exclusive, he is the master. If a csnap-server gets the > > "better" lock for read AND the "other" lock exclusive, > > he's the master. Same works for multiple priorities. > > > > 2. If a csnap-server get the lock to be master and he is not > > the best choice, the server can can check if other > > csnap-servers are queued behind him. If there are, he > > can unlock the lock and the re-lock the lock to give > > another node the change to be master. > > There are a few problems with this line of thinking: > > - You will be faced with the task of coding every possible resource > metric into some form of locking discipline. > > - Your resource metrics are step functions, the number of steps > being the number of locking layers you lather on. Real resource > metrics are more analog than that. > > - You haven't done anything to address the inherent raciness of > giving the lock to the first node to grab it. Chances are good > you'll always be giving it to the same node. > Daniel, I do not think of these as "problems". You never answered, How would a resource manager know to pick the "best" choice? The cluster is made up of software components (see pretty picture attached). IMHO, it would be good to follow some simple rules: 1. Components higher on the stack should only depend on components lower on the stack. Let's avoid circular dependencies. 2. When possible, use "standard" components and APIs. We have agreed that some common components: DLM cluster membership and quorum cluster communications (sort of) AFAICT, resource management is higher up the stack and having shared storage like cluster snapshot depend on it, would cause circular dependencies. SM, is a Sistina/Redhat specific thing. Might be wonderful, but it is not common. David's email leads me to believe it is not the right component to interface with. So, what is currently implemented that we have to work with? Membership and DLM. These are core services and see to be pretty solid right now. So how can we use these? Seems fairly simple: 1st implementation: =================== Add single DLM lock in csnap server. When a snap shot target is started, start up a csnap server. If the csnap server gets the lock, he is master. In normal operation, the csnap server is up and running on all nodes. One node has the DLM lock and the others are ready to go, but waiting for the DLM lock to convert. On failure, the next node to get the lock is master. If administrator knows which machines is "best", have him start the snapshot targets on that machine 1st. Not perfect, but simple and provides high availability. It is also possible for the csnap server to put its server address and port information in the LVB. This seems simple, workable, and easy to program. Follow on implementations ========================= Maybe multiple DLM locks for priorities, other options... Questions: I do understand what you mean by inherent raciness. Once a cluster is up and running, the first csnap server starts up. It does not stop until it dies, which I assume is rare. What raciness are you talking about? How complicated of a resource metric were you thinking about? I have read through design doc and still thinking about client reconnect. Are you planning on implementing the 4 message snapshot read protocol? There must be some internal cluster communication mechanisms for membership (cman) and DLM to work. Is there some reason why these are not suitable for snapshot client to server communication? Thanks, Daniel
Attachment:
arch.png
Description: PNG image