--- Christine Caulfield <ccaulfie@xxxxxxxxxx> wrote: > Ja S wrote: > > --- Ja S <jas199931@xxxxxxxxx> wrote: > > > >>>> A couple of further questions about the master > >>> copy of > >>>> lock resources. > >>>> > >>>> The first one: > >>>> ============= > >>>> > >>>> Again, assume: > >>>> 1) Node A is extremely too busy and handle all > >>>> requests > >>>> 2) other nodes are just idle and have never > >>> handled > >>>> any requests > >>>> > >>>> According to the documents, Node A will hold > all > >>>> master copies initially. The thing I am not > >> aware > >>> of > >>>> and unclear is whether the lock manager will > >>> evenly > >>>> distribute the master copies on Node A to other > >>> nodes > >>>> when it thinks the number of master copies on > >> Node > >>> A > >>>> is too many? > >>> Locks are only remastered when a node leaves the > >>> cluster. In that case > >>> all of its nodes will be moved to another node. > We > >>> do not do dynamic > >>> remastering - a resource that is mastered on one > >>> node will stay mastered > >>> on that node regardless of traffic or load, > until > >>> all users of the > >>> resource have been freed. > >> > >> Thank you very much. > >> > >> > >>>> The second one: > >>>> ============== > >>>> > >>>> Assume a master copy of lock resource is on > Node > >>> A. > >>>> Now Node B holds a local copy of the lock > >>> resource. > >>>> When the lock queues changed on the local copy > >> on > >>> Node > >>>> B, will the master copy on Node A be updated > >>>> simultaneously? If so, when more than one nodes > >>> have > >>>> the local copy of the same lock resource, how > >> the > >>> lock > >>>> manager to handle the update of the master > copy? > >>> Using > >>>> another lock mechanism to prevent the > corruption > >>> of > >>>> the master copy? > >>>> > >>> All locking happens on the master node. The > local > >>> copy is just that, a > >>> copy. It is updated when the master confirms > what > >>> has happened. The > >>> local copy is there mainly for rebuilding the > >>> resource table when a > >>> master leaves the cluster, and to keep a track > of > >>> locks that exist on > >>> the local node. The local copy is NOT complete. > it > >>> only contains local > >>> users of a resource. > >>> > >> Thanks again for the kind and detailed > explanation. > >> > >> > >> I am sorry I have to bother you again as I am > having > >> more questions. I analysed /proc/cluster/dlm_dir > and > >> dlm_locks and found some strange things. Please > see > >> below: > >> > >> > >> >From /proc/cluster/dlm_dir: > >> > >> In lock space [ABC]: > >> This node (node 2) has 445 lock resources in > total > >> where > >> --328 master lock resources > >> --117 local copies of lock resources mastered > on > >> other nodes. > >> > >> =============================== > >> =============================== > >> > >> > >> >From /proc/cluster/dlm_locks: > >> > >> In lock space [ABC]: > >> There are 1678 lock resouces in use where > >> --1674 lock resources are mastered by this node > >> (node > >> 2) > >> --4 lock resources are mastered by other > nodes, > >> within which: > >> ----1 lock resource mastered on node 1 > >> ----1 lock resource mastered on node 3 > >> ----1 lock resource mastered on node 4 > >> ----1 lock resource mastered on node 5 > >> > >> A typical master lock resource in > >> /proc/cluster/dlm_locks is: > >> Resource 000001000de4fd88 (parent > 0000000000000000). > >> Name (len=24) " 3 5fafc85" > >> Master Copy > >> LVB: 01 16 19 70 00 00 ff f8 00 00 00 00 00 00 00 > 00 > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 > >> Granted Queue > >> 1ff5036d NL Remote: 4 000603e8 > >> 80d2013f NL Remote: 5 00040214 > >> 00240209 NL Remote: 3 0001031d > >> 00080095 NL Remote: 1 00040197 > >> 00010304 NL > >> Conversion Queue > >> Waiting Queue > >> > >> > >> After search for local copy in > >> /proc/cluster/dlm_locks, I got: > >> Resource 000001002a273618 (parent > 0000000000000000). > >> Name (len=16) "withdraw 3......" > >> Local Copy, Master is node 3 > >> Granted Queue > >> 0004008d PR Master: 0001008c > >> Conversion Queue > >> Waiting Queue > >> > >> -- > >> Resource 000001003fe69b68 (parent > 0000000000000000). > >> Name (len=16) "withdraw 5......" > >> Local Copy, Master is node 5 > >> Granted Queue > >> 819402ef PR Master: 00010317 > >> Conversion Queue > >> Waiting Queue > >> > >> -- > >> Resource 000001002a2732e8 (parent > 0000000000000000). > >> Name (len=16) "withdraw 1......" > >> Local Copy, Master is node 1 > >> Granted Queue > >> 000401e9 PR Master: 00010074 > >> Conversion Queue > >> Waiting Queue > >> > >> -- > >> Resource 000001004a32e598 (parent > 0000000000000000). > >> Name (len=16) "withdraw 4......" > >> Local Copy, Master is node 4 > >> Granted Queue > >> 1f5b0317 PR Master: 00010203 > >> Conversion Queue > >> Waiting Queue > >> > >> These four local copy of lock resources have been > >> staying in /proc/cluster/dlm_locks for several > days. > >> > >> Now my questions: > >> 1. In my case, for the same lock space, the > number > >> of > >> master lock resources reported by dlm_dir is much > >> SMALLER than that reported in dlm_locks. My > >> understanding is that master lock resources > listed > >> in > >> dlm_dir must be larger than or at least the same > as > >> that reported in dlm_locks. The situation I > >> discovered > >> on the node does not make any sense to me. Am I > >> missing anything? Can you help me to clarify the > >> case? > > > > I have found the answer. Yes, I did miss > something. I > > need to sum all lock resources mastered by the > node on > > all cluster members. In this case, the total > number of > > lock resources mastered by the node is just 1674, > > which matches the number reported from dlm_locks. > > Sorry for asking the question without careful > > thinking. > > > > > >> 2. What can cause "withdraw ...." to be the lock > >> resource name? > > > > After read the gfs source code, it seems that this > is > > caused by issuing a command like "gfs_tool > withdraw > > <mountpoint>". However, I checked all command > > histroies on all nodes in the cluster, but did not > > find any command like this. This question and the > next > > question remain open. Please help. > > > You might like to ask GFS-specific questions on a > new thread. I don't > know about GFS and the people who do are probable > not reading this one > by now ;-) > > > >> 3. These four local copy of lock resources have > not > >> been released for at least serveral days as I > knew. > >> How can I find out whether they are in a strange > >> dead > >> situation or are still waiting for the lock > manager > >> to release them? How to change the timeout? > > There is no lock timeout for local copies. If a lock > is shown in > dlm_locks then either the lock is active somewhere > or you have found a bug! > > Bear in mind that this is a DLM response, GFS does > cache locks but don't > know the details. > Thank you for the information. Best, Jas ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster