Hi,
We've been having some problems with doing a write's to our GFS file
system, and it will pause, for long periods. (Like from 5 to 10
seconds, to 30 seconds, and occasially 5 minutes) After the pause, it's
like nothing happened, whatever the process is, just keeps going happy
as can be.
Except for these pauses, our GFS is quite zippy, both reads and writes.
But these pauses are holding us back from going full production.
I need to know what tools I should use to figure out what is causing
these pauses.
Here is the setup.
-------------------
All machines: RHEL 4 update 1 (ok, actually S.L. 4.1), kernel
2.6.9-11.ELsmp, GFS 6.1.0, ccs 1.0.0, gulm 1.0.0, rgmanager 1.9.34
I have no ability to do fencing yet, so I chose to use the gulm locking
mechanism. I have it setup so that there are 3 lock servers, for
failover. I have tested the failover, and it works quite well.
I have 5 machines in the cluster. 1 isn't connected to the SAN, or
using GFS. It is just a failover gulm lock server incase the other two
lock servers go down.
So I have 4 machines connected to our SAN and using GFS. 3 are
read-only, 1 is read-write. If it is important, the 3 read-only are
x86_64, the 1 read-write and the 1 not connected are i386.
The read/write machine is our master lock server. Then one of the
read-only is a fallback lock server, as is the machine not using GFS.
----------------
Anyway, we're getting these pauses when writting, and I'm having a hard
time tracking down where the problem is. I *think* that we can still
read from the other machines. But since this comes and goes, I haven't
been able to verify that.
Anyway, which tools do you think would be best in diagnosing this?
Many Thanks
Troy Dawson
--
__________________________________________________
Troy Dawson dawson@xxxxxxxx (630)840-6468
Fermilab ComputingDivision/CSS CSI Group
__________________________________________________
--
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster