Re: Problem in clvmd/dlm_recoverd

Tom Lanyon <tom@xxxxxxxxxxxxxx> · Wed, 19 Nov 2008 15:00:33 +1030

On 19/11/2008, at 2:06 AM, David Teigland wrote:

On Tue, Nov 18, 2008 at 05:14:38PM +1030, Tom Lanyon wrote:
We seem to be having the same problem on a 5 node virtual cluster
where 3 of the nodes share a GFS mount.

A backup script runs on one node which does some heavy reads + writes
to this mount at which point all three nodes jump to 100% cpu (90%
iowait on the machine that is doing the backup, 100% system on the
other two) and all LVM VGs, LVs and GFS mounts lock up.

Which process was using 100% cpu?  If it was groupd, fenced,  
dlm_controld
or gfs_controld, then yes it may be the same problem.

Is there anything that could be tuned here to avoid this issue  
until a
bug fix is released?

I don't think there's any way to avoid the bug in the bz I referenced.

Dave

We haven't been able to catch it quick enough to determine which  
process is using all CPU.

The other option is that we're just seeing a huge amount of glocks  
created on the node running backups and all others (webservers) are  
just hanging whilst trying to access files. I've just done some fairly  
aggressive tuning of the GFS mounts on all nodes; hopefully this fixes  
it!

Regards,
Tom

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster