Wendy Cheng wrote:
Terry wrote:
On Tue, Jun 17, 2008 at 5:22 PM, Terry <td3201@xxxxxxxxx> wrote:
On Tue, Jun 17, 2008 at 3:09 PM, Wendy Cheng
<s.wendy.cheng@xxxxxxxxx> wrote:
Hi, Terry,
I am still seeing some high load averages. Here is an example of a
gfs configuration. I left statfs_fast off as it would not apply to
one of my volumes for an unknown reason. Not sure that would have
helped anyways. I do, however, feel that reducing scand_secs
helped a
little:
Sorry I missed scand_secs (was mindless as the brain was mostly
occupied by
day time work).
To simplify the view, glock states include exclusive (write), share
(read),
and not-locked (in reality, there are more). Exclusive lock has to be
demoted (demote_secs) to share, then to not-locked (another
demote_secs)
before it is scanned (every scand_secs) to get added into reclaim
list where
it can be purged. Between exclusive and share state transition, the
file
contents need to get flushed to disk (to keep file content cluster
coherent). All of above assume the file (protected by this glock)
is not
accessed (idle).
You hit an area that GFS normally doesn't perform well. With GFS1 in
maintenance mode while GFS2 seems to be so far away, ext3 could be
a better
answer. However, before switching, do make sure to test it
thoroughly (since
Ext3 could have the very same issue as well - check out:
http://marc.info/?l=linux-nfs&m=121362947909974&w=2 ).
Did you look (and test) GFS "nolock" protocol (for single node
GFS)? It
bypasses some locking overhead and can be switched to DLM in the
future
(just make sure you reserve enough journal space - the rule of
thumb is one
journal per node and know how many nodes you plan to have in the
future).
-- Wendy
Good points. I could try the nolock feature I suppose. Not quite
clear on how to reserve journal space. I forgot to post the cpu time,
check out this:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4822 root 10 -5 0 0 0 S 1 0.0 2159:15 dlm_recv
4820 root 10 -5 0 0 0 S 1 0.0 368:09.34 dlm_astd
4821 root 10 -5 0 0 0 S 0 0.0 153:06.80 dlm_scand
3659 root 10 -5 0 0 0 S 0 0.0 134:40.14 scsi_wq_4
4823 root 11 -5 0 0 0 S 1 0.0 109:33.33 dlm_send
367 root 10 -5 0 0 0 S 0 0.0 103:33.74 kswapd0
gfs_glockd is further below so not so concerned with that right now.
It appears turning on nolock would do the trick. The times aren't
extremely accurate because I have failed this cluster between nodes
while testing.
Here is some more testing information....
I created a new volume on my iscsi san of 1 TB and formatted it for
ext3. I then used dd to create a 100G file. This yielded roughly 900
Mb/sec. I then stopped my application and did the same thing with an
existing GFS volume. This gave me about 850 Kb/sec. This isn't an
iscsi issue. This appears to be a load issue and the number of I/O
occurring on these volumes. That said, I would expect that performing
the changes I did would result in a major performance improvement.
Since it didn't, what are my other points I could consider? If its a
GFS issue, ext3 is the way to go. Maybe even switch to using
active-active on my NFS cluster. If its a backend disk issue, I
would expect to see the throughput on my iscsi link (bond1) be fully
utilized. Its not. Could I be thrashing the disks? This is an iscsi
san with 30 sata disks. Just bouncing some thoughts around to see if
anyone has any more thoughts.
Really need to focus on my day time job - its worload has been
climbing ... but can't help to place a quick comment here ..
The 900 MB/s vs. 850 KB/s difference looks like a caching issue -
that is, for 900 MB/s, it looks like the data was still lingering in
the system cache while in 850 KB/s case, the data might already hit
disk. Cluster filesystem normally syncs more by its nature. In
general, ext3 does perform better in single node environment but the
difference should not be as big as above.
There are certainly more tuning knobs available (such as journal size
and/or network buffer size) to make GFS-iscsi "dd" run better but it
is pointless. To deploy a cluster filesystem for production usage, the
tuning should not be driven by such a simple-mind command. You also
have to consider the support issues when deploying a filesystem. GFS1
is a little bit out of date and any new development and/or significant
performance improvements would likely be in GFS2, not in GFS1.
Research GFS2 (googling to see how other people said about it) to
understand whether its direction fits your need (so you can migrate
from GFS1 to GFS2 if you bump into any show stopper in the future). If
not, ext3 (with ext4 actively developed) is a fine choice if I read
your configuration right from previous posts.
Or .. there is a known GFS1 writepage issue if most of your files are
all very big .. The problem is fixed in RHEL kernels though. What is
your kernel version ?
-- Wendy
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster