Ja S wrote: > --- Ja S <jas199931@xxxxxxxxx> wrote: > >>>> A couple of further questions about the master >>> copy of >>>> lock resources. >>>> >>>> The first one: >>>> ============= >>>> >>>> Again, assume: >>>> 1) Node A is extremely too busy and handle all >>>> requests >>>> 2) other nodes are just idle and have never >>> handled >>>> any requests >>>> >>>> According to the documents, Node A will hold all >>>> master copies initially. The thing I am not >> aware >>> of >>>> and unclear is whether the lock manager will >>> evenly >>>> distribute the master copies on Node A to other >>> nodes >>>> when it thinks the number of master copies on >> Node >>> A >>>> is too many? >>> Locks are only remastered when a node leaves the >>> cluster. In that case >>> all of its nodes will be moved to another node. We >>> do not do dynamic >>> remastering - a resource that is mastered on one >>> node will stay mastered >>> on that node regardless of traffic or load, until >>> all users of the >>> resource have been freed. >> >> Thank you very much. >> >> >>>> The second one: >>>> ============== >>>> >>>> Assume a master copy of lock resource is on Node >>> A. >>>> Now Node B holds a local copy of the lock >>> resource. >>>> When the lock queues changed on the local copy >> on >>> Node >>>> B, will the master copy on Node A be updated >>>> simultaneously? If so, when more than one nodes >>> have >>>> the local copy of the same lock resource, how >> the >>> lock >>>> manager to handle the update of the master copy? >>> Using >>>> another lock mechanism to prevent the corruption >>> of >>>> the master copy? >>>> >>> All locking happens on the master node. The local >>> copy is just that, a >>> copy. It is updated when the master confirms what >>> has happened. The >>> local copy is there mainly for rebuilding the >>> resource table when a >>> master leaves the cluster, and to keep a track of >>> locks that exist on >>> the local node. The local copy is NOT complete. it >>> only contains local >>> users of a resource. >>> >> Thanks again for the kind and detailed explanation. >> >> >> I am sorry I have to bother you again as I am having >> more questions. I analysed /proc/cluster/dlm_dir and >> dlm_locks and found some strange things. Please see >> below: >> >> >> >From /proc/cluster/dlm_dir: >> >> In lock space [ABC]: >> This node (node 2) has 445 lock resources in total >> where >> --328 master lock resources >> --117 local copies of lock resources mastered on >> other nodes. >> >> =============================== >> =============================== >> >> >> >From /proc/cluster/dlm_locks: >> >> In lock space [ABC]: >> There are 1678 lock resouces in use where >> --1674 lock resources are mastered by this node >> (node >> 2) >> --4 lock resources are mastered by other nodes, >> within which: >> ----1 lock resource mastered on node 1 >> ----1 lock resource mastered on node 3 >> ----1 lock resource mastered on node 4 >> ----1 lock resource mastered on node 5 >> >> A typical master lock resource in >> /proc/cluster/dlm_locks is: >> Resource 000001000de4fd88 (parent 0000000000000000). >> Name (len=24) " 3 5fafc85" >> Master Copy >> LVB: 01 16 19 70 00 00 ff f8 00 00 00 00 00 00 00 00 >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> Granted Queue >> 1ff5036d NL Remote: 4 000603e8 >> 80d2013f NL Remote: 5 00040214 >> 00240209 NL Remote: 3 0001031d >> 00080095 NL Remote: 1 00040197 >> 00010304 NL >> Conversion Queue >> Waiting Queue >> >> >> After search for local copy in >> /proc/cluster/dlm_locks, I got: >> Resource 000001002a273618 (parent 0000000000000000). >> Name (len=16) "withdraw 3......" >> Local Copy, Master is node 3 >> Granted Queue >> 0004008d PR Master: 0001008c >> Conversion Queue >> Waiting Queue >> >> -- >> Resource 000001003fe69b68 (parent 0000000000000000). >> Name (len=16) "withdraw 5......" >> Local Copy, Master is node 5 >> Granted Queue >> 819402ef PR Master: 00010317 >> Conversion Queue >> Waiting Queue >> >> -- >> Resource 000001002a2732e8 (parent 0000000000000000). >> Name (len=16) "withdraw 1......" >> Local Copy, Master is node 1 >> Granted Queue >> 000401e9 PR Master: 00010074 >> Conversion Queue >> Waiting Queue >> >> -- >> Resource 000001004a32e598 (parent 0000000000000000). >> Name (len=16) "withdraw 4......" >> Local Copy, Master is node 4 >> Granted Queue >> 1f5b0317 PR Master: 00010203 >> Conversion Queue >> Waiting Queue >> >> These four local copy of lock resources have been >> staying in /proc/cluster/dlm_locks for several days. >> >> Now my questions: >> 1. In my case, for the same lock space, the number >> of >> master lock resources reported by dlm_dir is much >> SMALLER than that reported in dlm_locks. My >> understanding is that master lock resources listed >> in >> dlm_dir must be larger than or at least the same as >> that reported in dlm_locks. The situation I >> discovered >> on the node does not make any sense to me. Am I >> missing anything? Can you help me to clarify the >> case? > > I have found the answer. Yes, I did miss something. I > need to sum all lock resources mastered by the node on > all cluster members. In this case, the total number of > lock resources mastered by the node is just 1674, > which matches the number reported from dlm_locks. > Sorry for asking the question without careful > thinking. > > >> 2. What can cause "withdraw ...." to be the lock >> resource name? > > After read the gfs source code, it seems that this is > caused by issuing a command like "gfs_tool withdraw > <mountpoint>". However, I checked all command > histroies on all nodes in the cluster, but did not > find any command like this. This question and the next > question remain open. Please help. You might like to ask GFS-specific questions on a new thread. I don't know about GFS and the people who do are probable not reading this one by now ;-) >> 3. These four local copy of lock resources have not >> been released for at least serveral days as I knew. >> How can I find out whether they are in a strange >> dead >> situation or are still waiting for the lock manager >> to release them? How to change the timeout? There is no lock timeout for local copies. If a lock is shown in dlm_locks then either the lock is active somewhere or you have found a bug! Bear in mind that this is a DLM response, GFS does cache locks but don't know the details. -- Chrissie -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster