> > > > A couple of further questions about the master > copy of > > lock resources. > > > > The first one: > > ============= > > > > Again, assume: > > 1) Node A is extremely too busy and handle all > > requests > > 2) other nodes are just idle and have never > handled > > any requests > > > > According to the documents, Node A will hold all > > master copies initially. The thing I am not aware > of > > and unclear is whether the lock manager will > evenly > > distribute the master copies on Node A to other > nodes > > when it thinks the number of master copies on Node > A > > is too many? > > Locks are only remastered when a node leaves the > cluster. In that case > all of its nodes will be moved to another node. We > do not do dynamic > remastering - a resource that is mastered on one > node will stay mastered > on that node regardless of traffic or load, until > all users of the > resource have been freed. Thank you very much. > > > The second one: > > ============== > > > > Assume a master copy of lock resource is on Node > A. > > Now Node B holds a local copy of the lock > resource. > > When the lock queues changed on the local copy on > Node > > B, will the master copy on Node A be updated > > simultaneously? If so, when more than one nodes > have > > the local copy of the same lock resource, how the > lock > > manager to handle the update of the master copy? > Using > > another lock mechanism to prevent the corruption > of > > the master copy? > > > > All locking happens on the master node. The local > copy is just that, a > copy. It is updated when the master confirms what > has happened. The > local copy is there mainly for rebuilding the > resource table when a > master leaves the cluster, and to keep a track of > locks that exist on > the local node. The local copy is NOT complete. it > only contains local > users of a resource. > Thanks again for the kind and detailed explanation. I am sorry I have to bother you again as I am having more questions. I analysed /proc/cluster/dlm_dir and dlm_locks and found some strange things. Please see below: >From /proc/cluster/dlm_dir: In lock space [ABC]: This node (node 2) has 445 lock resources in total where --328 master lock resources --117 local copies of lock resources mastered on other nodes. =============================== =============================== >From /proc/cluster/dlm_locks: In lock space [ABC]: There are 1678 lock resouces in use where --1674 lock resources are mastered by this node (node 2) --4 lock resources are mastered by other nodes, within which: ----1 lock resource mastered on node 1 ----1 lock resource mastered on node 3 ----1 lock resource mastered on node 4 ----1 lock resource mastered on node 5 A typical master lock resource in /proc/cluster/dlm_locks is: Resource 000001000de4fd88 (parent 0000000000000000). Name (len=24) " 3 5fafc85" Master Copy LVB: 01 16 19 70 00 00 ff f8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Granted Queue 1ff5036d NL Remote: 4 000603e8 80d2013f NL Remote: 5 00040214 00240209 NL Remote: 3 0001031d 00080095 NL Remote: 1 00040197 00010304 NL Conversion Queue Waiting Queue After search for local copy in /proc/cluster/dlm_locks, I got: Resource 000001002a273618 (parent 0000000000000000). Name (len=16) "withdraw 3......" Local Copy, Master is node 3 Granted Queue 0004008d PR Master: 0001008c Conversion Queue Waiting Queue -- Resource 000001003fe69b68 (parent 0000000000000000). Name (len=16) "withdraw 5......" Local Copy, Master is node 5 Granted Queue 819402ef PR Master: 00010317 Conversion Queue Waiting Queue -- Resource 000001002a2732e8 (parent 0000000000000000). Name (len=16) "withdraw 1......" Local Copy, Master is node 1 Granted Queue 000401e9 PR Master: 00010074 Conversion Queue Waiting Queue -- Resource 000001004a32e598 (parent 0000000000000000). Name (len=16) "withdraw 4......" Local Copy, Master is node 4 Granted Queue 1f5b0317 PR Master: 00010203 Conversion Queue Waiting Queue These four local copy of lock resources have been staying in /proc/cluster/dlm_locks for several days. Now my questions: 1. In my case, for the same lock space, the number of master lock resources reported by dlm_dir is much SMALLER than that reported in dlm_locks. My understanding is that master lock resources listed in dlm_dir must be larger than or at least the same as that reported in dlm_locks. The situation I discovered on the node does not make any sense to me. Am I missing anything? Can you help me to clarify the case? 2. What can cause "withdraw ...." to be the lock resource name? 3. These four local copy of lock resources have not been released for at least serveral days as I knew. How can I find out whether they are in a strange dead situation or are still waiting for the lock manager to release them? How to change the timeout? Thank you very much for your great further help in advance. Jas ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster