I have a script which runs gfs_quota to set quotas for all the users on my GFS filesystem. When it's run simultaneously on two nodes, errors like the following begin to appear: lock_dlm: lm_dlm_cancel 2,34 flags 80 lock_dlm: lm_dlm_cancel rv 0 2,34 flags 40080 lock_dlm: complete dlm cancel 2,34 flags 40000 ... lock_dlm: lm_dlm_cancel 2,34 flags 80 lock_dlm: complete dlm cancel 2,34 flags 40000 lock_dlm: lm_dlm_cancel rv 0 2,34 flags 80 ... lock_dlm: lm_dlm_cancel 2,34 flags 84 lock_dlm: lm_dlm_cancel skip 2,34 flags 84 ... lock_dlm: lm_dlm_cancel 2,34 flags 80 dlm: cancel granted 1350055 lock_dlm: lm_dlm_cancel rv 0 2,34 flags 40000 lock_dlm: extra completion 2,34 5,5 id 1350055 flags 40000 and, more rarely: lock_dlm: lm_dlm_cancel 2,34 flags 80 lock_dlm: lm_dlm_cancel rv 0 2,34 flags 40080 dlm: desktop-home-1: cancel reply ret -22 lock_dlm: ast sb_status -22 2,34 flags 40000 ... lock_dlm: lm_dlm_cancel 2,34 flags 80 lock_dlm: lm_dlm_cancel rv -16 2,34 flags 40080 At the same time, I/O to the GFS partition hangs. Rebooting one of the two nodes allows the cluster to recover. On my smaller test cluster, I've been able to reproduce some of the errors: lock_dlm: lm_dlm_cancel 2,18 flags 84 lock_dlm: lm_dlm_cancel rv 0 2,18 flags 40080 lock_dlm: complete dlm cancel 2,18 flags 40000 ... lock_dlm: lm_dlm_cancel 2,18 flags 80 lock_dlm: lm_dlm_cancel skip 2,18 flags 0 though not the I/O hangs. My shared storage is over AoE and I'm using the following packages: GFS-6.1.6-1 dlm-1.0.1-1 cman-1.0.11-0 GFS-kernel-hugemem-2.6.9-60.9 dlm-kernel-hugemem-2.6.9-44.9 cman-kernel-hugemem-2.6.9-45.15 kernel-hugemem-2.6.9-42.0.10.EL I must admit, I've not been able to find out much about what dlm cancels are or what triggers them. Can anyone shed some light on this? Robert -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster