Re: Cluster failure, dlm overload

"wsfax alu.es" <wsfax.alu.es@xxxxxxxxx> · Wed, 11 Apr 2012 17:17:29 +0200

Update of the information about this problem.

    We see that the loop that causes
    the overload of "dlm" is:

Node 1 sends a "lookup" message, related to some filesystem
        and inode, to the master node (node 3), asking for the current
        owner of this element.
Node 3 replies "the owner of this element is now the node 4".
Node 1 sends a "request" message to node 4.
Node 4 replies "I have not it" (error code EBADR = -53).
goto step 1

This loop appends several hundreds per seconds, multiplied by all
      filesystem and inodes with this problem. In total, several tenths
      of thousands messages in DLM, until restart of the cluster

Kind regards.
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster