lm_dlm_cancel during quota operations

Robert Clark <cluster@xxxxxxxxxxxxxx> · Fri, 20 Apr 2007 16:31:17 +0100

  I have a script which runs gfs_quota to set quotas for all the users
on my GFS filesystem. When it's run simultaneously on two nodes, errors
like the following begin to appear:

    lock_dlm: lm_dlm_cancel 2,34 flags 80
    lock_dlm: lm_dlm_cancel rv 0 2,34 flags 40080
    lock_dlm: complete dlm cancel 2,34 flags 40000
...
    lock_dlm: lm_dlm_cancel 2,34 flags 80
    lock_dlm: complete dlm cancel 2,34 flags 40000
    lock_dlm: lm_dlm_cancel rv 0 2,34 flags 80
...
    lock_dlm: lm_dlm_cancel 2,34 flags 84
    lock_dlm: lm_dlm_cancel skip 2,34 flags 84
...
    lock_dlm: lm_dlm_cancel 2,34 flags 80
    dlm: cancel granted 1350055
    lock_dlm: lm_dlm_cancel rv 0 2,34 flags 40000
    lock_dlm: extra completion 2,34 5,5 id 1350055 flags 40000

and, more rarely:

    lock_dlm: lm_dlm_cancel 2,34 flags 80
    lock_dlm: lm_dlm_cancel rv 0 2,34 flags 40080
    dlm: desktop-home-1: cancel reply ret -22
    lock_dlm: ast sb_status -22 2,34 flags 40000
...
    lock_dlm: lm_dlm_cancel 2,34 flags 80
    lock_dlm: lm_dlm_cancel rv -16 2,34 flags 40080

At the same time, I/O to the GFS partition hangs. Rebooting one of the
two nodes allows the cluster to recover.

  On my smaller test cluster, I've been able to reproduce some of the
errors:

    lock_dlm: lm_dlm_cancel 2,18 flags 84
    lock_dlm: lm_dlm_cancel rv 0 2,18 flags 40080
    lock_dlm: complete dlm cancel 2,18 flags 40000
...
    lock_dlm: lm_dlm_cancel 2,18 flags 80
    lock_dlm: lm_dlm_cancel skip 2,18 flags 0

though not the I/O hangs.

  My shared storage is over AoE and I'm using the following packages:

GFS-6.1.6-1
dlm-1.0.1-1
cman-1.0.11-0
GFS-kernel-hugemem-2.6.9-60.9
dlm-kernel-hugemem-2.6.9-44.9
cman-kernel-hugemem-2.6.9-45.15
kernel-hugemem-2.6.9-42.0.10.EL

  I must admit, I've not been able to find out much about what dlm
cancels are or what triggers them. Can anyone shed some light on this?

	Robert

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster