On 19/11/2008, at 2:06 AM, David Teigland wrote:
On Tue, Nov 18, 2008 at 05:14:38PM +1030, Tom Lanyon wrote:
We seem to be having the same problem on a 5 node virtual cluster
where 3 of the nodes share a GFS mount.
A backup script runs on one node which does some heavy reads + writes
to this mount at which point all three nodes jump to 100% cpu (90%
iowait on the machine that is doing the backup, 100% system on the
other two) and all LVM VGs, LVs and GFS mounts lock up.
Which process was using 100% cpu? If it was groupd, fenced,
dlm_controld
or gfs_controld, then yes it may be the same problem.
Is there anything that could be tuned here to avoid this issue
until a
bug fix is released?
I don't think there's any way to avoid the bug in the bz I referenced.
Dave
We haven't been able to catch it quick enough to determine which
process is using all CPU.
The other option is that we're just seeing a huge amount of glocks
created on the node running backups and all others (webservers) are
just hanging whilst trying to access files. I've just done some fairly
aggressive tuning of the GFS mounts on all nodes; hopefully this fixes
it!
Regards,
Tom
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster