Re: gfs2_tool settune demote_secs

Scooter Morris <scooter@xxxxxxxxxxxx> · Mon, 12 Oct 2009 05:57:54 -0700

Steve,

    Thanks for the informative, and detailed response -- it really
helps to understand what might be happening.  We're not mounting with
noatime, and it sounds like that would be a good first step.  

Thanks!

-- scooter

Steven Whitehouse wrote:

  Hi,

On Fri, 2009-10-09 at 10:57 -0700, Scooter Morris wrote:

    Steve,
    Thanks for the prompt reply.  Like Kaerka, I'm running on
large-memory servers and decreasing demote_secs from 300 to 20
resulted in significant performance improvements because locks get
freed much more quickly (I assume), resulting in much better response.
It could certainly be that changing demote_secs was a workaround for a
different bug that has now been fixed, which would be great.  I'll try
some tests today and see how "rm -rf" on a large directory behaves.

-- scooter

  The question though, is why that should result in a better response. It
doesn't really make sense, since the caching of the "locks" (really
caching of data and metadata controlled by a lock) should improve the
performance due to more time to write out the dirty data.

Doing an "rm -fr" is also a very different workload to that of reading
all the files in the filesystem once (for backup purposes for example)
since the "rm -fr" requires writing to the fs and the backup process
doesn't do any writing.

How long it takes to remove a file also depends to a large extent on its
size.

In both cases, however it would improve performance if you could arrange
to remove, or read inodes in inode number order. Both GFS and GFS2
return inodes from getdents64 (readdir) in a pseudo-random order based
on the hash of the filename. You can gain a lot of performance if these
results are sorted before they are scanned.

Ideally we'd return them from the fs in sorted order. Unfortunately a
design decision which was made a long time ago which, in combination
with the design of the Linux VFS prevents us from doing that.

If there is a problem with a node caching the whole filesystem after it
has been scanned, then it is still possible to solve this issue:

echo 3 > /proc/sys/vm/drop_caches

I guess I should also point out that it is a good idea to mount with the
noatime mount option if there is going to be a read-only scan of the
complete filesystem on a regular basis, since that will prevent that
becoming a "write to every inode" scan. That will also make a big
performance difference. Note that its ok (in recent kernels) to mount a
GFS2 filesystem more than once with different atime flags (using bind
mounts) in case you have an application which requires atime, but you
want to avoid it when running a back up.

There is also /proc/sys/vm/vfs_cache_pressure as well, which may help
optimise your workload.

... and if all that fails, then the next thing to do is to use
blktrace/seekwatcher to find out whats really going on, on the disk and
send the results so that we can have a look and see if we can improve
the disk I/O. Better still if you can combine that with a trace from the
gfs2 tracepoints so we can see the locking at the same time,

Steve.

    Kaerka Phillips wrote: 

      If in gfs2 glocks are purged based upon memory constraints, what
happens if it is run on a box with large amounts of memory? i.e.
RHEL5.x with 128gb ram?  We ended up having to move away from GFS2
due to serious performance issues with this exact setup, and our
performance issues were largely centered around commands like ls or
rm against gfs2 filesystems with large directory structures and
millions of files in them.

In our case, something as simple as copying a whole filesystem to
another filesystem would cause a load avg of 50 or more, and would
take 8+ hours to complete.  The same thing on NFS or ext3 would take
usually 1 to 2 hours.  Netbackup of 10 of those filesystems took ~40
hours to complete, so we were getting maybe 1 good backup per week,
and in some cases the backup itself caused cluster crash.

We are still using our GFS1 clusters, since as long as their network
is stable, their performance is very good, but we are phasing out
most of our GFS2 clusters to NFS instead.

On Fri, Oct 9, 2009 at 1:01 PM, Steven Whitehouse
<swhiteho@xxxxxxxxxx> wrote:
        Hi,

        On Fri, 2009-10-09 at 09:55 -0700, Scooter Morris wrote:
        > Hi all,
        >     On RHEL 5.3/5.4(?) we had changed the value of
        demote_secs to
        > significantly improve the performance of our gfs2
        filesystem for certain
        > tasks (notably rm -r on large directories).  I recently
        noticed that
        > that tuning value is no longer available (part of a recent
        update, or
        > part of 5.4?).  Can someone tell me what, if anything
        replaces this?  Is
        > it now a mount option, or is there some other way to tune
        this value?
        >
        > Thanks in advance.
        >
        > -- scooter
        >

        > --
        > Linux-cluster mailing list
        > Linux-cluster@xxxxxxxxxx
        > https://www.redhat.com/mailman/listinfo/linux-cluster

        Nothing replaces it. The glocks are disposed of
        automatically on an LRU
        basis when there is enough memory pressure to require it.
        You can alter
        the amount of memory pressure on the VFS caches (including
        the glocks)
        but not specifically the glocks themselves.

        The idea is that is should be self-tuning now, adjusting
        itself to the
        conditions prevailing at the time. If there are any
        remaining
        performance issues though, we'd like to know so that they
        can be
        addressed,

        Steve.

        --
        Linux-cluster mailing list
        Linux-cluster@xxxxxxxxxx
        https://www.redhat.com/mailman/listinfo/linux-cluster

____________________________________________________________________

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

    --
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster