We have a customer who we believe is putting excessive locking
pressure on one of several gfs volumes (9 total across 5 systems).
They've started to get occasional load spikes that seem to show that
the gfs is "locking" for a minute or two. Without any action on our
part the load spikes clear and everything continues as normal.
And we've recently seen the following log entries:
Sep 2 12:57:57 xc88-s00007 kernel: lock_dlm: gdlm_cancel 1,2 flags 0
Sep 2 12:57:57 xc88-s00007 kernel: lock_dlm: gdlm_cancel skip 1,2
flags 0
Sep 2 12:57:58 xc88-s00007 kernel: lock_dlm: gdlm_cancel 1,2 flags 0
Sep 2 12:57:58 xc88-s00007 kernel: lock_dlm: gdlm_cancel skip 1,2
flags 0
Sep 2 12:58:40 xc88-s00007 kernel: lock_dlm: gdlm_cancel 1,2 flags 0
Sep 2 12:58:40 xc88-s00007 kernel: lock_dlm: gdlm_cancel skip 1,2
flags 0
Sep 2 12:58:58 xc88-s00007 kernel: lock_dlm: gdlm_cancel 1,2 flags 0
Sep 2 12:58:58 xc88-s00007 kernel: lock_dlm: gdlm_cancel skip 1,2
flags 0
Sep 2 12:59:14 xc88-s00007 kernel: lock_dlm: gdlm_cancel 1,2 flags 0
Sep 2 12:59:14 xc88-s00007 kernel: lock_dlm: gdlm_cancel skip 1,2
flags 0
For all intents and purposes we're running RHCS2 from RHEL 5.2 w/ the
RHEL 5.2 kernel (2.6.18-92.1.10)
This used to happen to this customer a lot more frequently on RHCS1
(1.03), but we upgraded them to the above RHCS2 packages and kernel
and things have been much better.
I'm going to start dumping gfs_tool counters data for the various gfs
filesystems.
Any advice tracking this down would be useful.
Thanks!
--
Edward Muller
Engine Yard Inc. : Support, Scalability, Reliability
+1.866.518.9273 x209 - Mobile: +1.417.844.2435
IRC: edwardam - XMPP/GTalk: emuller@xxxxxxxxxxxxxx
Pacific/US
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster