Lon Hohberger wrote: > On Wed, 2009-05-20 at 08:08 +0100, Chrissie Caulfield wrote: >>> - if a quorum device exists and it is being reregistered with the same >>> name, just change the votes and recalculate quorum >> cman doesn't allow the votes to be changed without deregistering and >> reregistering the quorum device. >> >> I have checked the code and I can't see any reason why doing it this way >> would fail, if register succeeds then it allocates a new node structure >> for the qdisk and populates it from the parameters given. >> >> Is it possible that qdisk might not unregister the qdisk when it is >> stopped under some circumstances ? > > It's possible, but unlikely -- it only ever doesn't unregister if it: > > (a) hits I/O errors > (b) is killed with -SIGKILL > (c) cman went away (in which case it doesn't matter :) ) > > I suspect: > > if (quorum_device->state == NODESTATE_MEMBER) > return -EBUSY; Yes, that sounds very likely > ... is causing the unregister operation to fail. Maybe I need to call > cman_poll_quorum_device(xxx, 0). It seems a bit odd. > > Basically, the use case is online upgrade of # of nodes in the cluster. > > 3 nodes + 2-vote quorum device ==> 4 nodes + 3-vote quorum device > > In my mind, it'd work like: > > * Ensure all current members are up and healthy > * each old member sees: votes = 3 + 2 > * Update cluster.conf w/ new member. > * Copy cluster.conf to new member > * each old member sees: votes = 4 + 2 > * Have new member start cluster stack > * each old member sees: votes = 4 + 2 > * the new member sees: votes = 4 + 3 > * Stop qdiskd on the old nodes > * each old member sees: votes = 4 > * the new member sees: votes = 4 + 3 > * Restart qdiskd on the old nodes > * everyone is consistent w/ 4 + 3 > > I don't think calling poll(0) will make a difference in the above case, > but I had gotten used to the fact that if you kill qdiskd you had a few > seconds to restart it before CMAN noticed... > > So, I can fix it I think with poll(0), but if an admin kills qdiskd with > SIGKILL (or any other fatal signal), restarting qdiskd will prevent > correct vote registration (though as I have found out, polling still > works great). When qdiskd restarts, if you get EBUSY from _register then you could deregister and reregister with the new information. There's an argument here for a cman API call to change the number of votes associated with the quorum disk though ... what do you think ? Chrissie -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster