RE: Directories with >100K files

nick@xxxxxxxxxxxxxxx · Wed, 21 Jan 2009 10:32:02 +0000

Hi,

Quoting Steven Whitehouse <swhiteho@xxxxxxxxxx>:

> Hi,
>
> On Tue, 2009-01-20 at 22:32 -0500, Jeff Sturm wrote:
> > > -----Original Message-----
> > > From: linux-cluster-bounces@xxxxxxxxxx
> > > [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of
> > > nick@xxxxxxxxxxxxxxx
> > > Sent: Tuesday, January 20, 2009 5:19 AM
> > > To: linux-cluster@xxxxxxxxxx
> > > Subject:  Directories with >100K files
> > >
> > > We have a GFS filesystem mounted over iSCSI. When doing an
> > > 'ls' on directories with several thousand files it takes
> > > around 10 minutes to get a response back -
> >
> > You don't say how many nodes you have, or anything about your
> > networking.
> >
> > Some general pointers:
> >
> > - A plain "ls" is probably much faster any variant that fetches inode
> > metatdata, e.g. "ls -l".  The latter performs a stat() on each
> > individual file which in turn triggers locking activity of some sort.
> > This is known to be slow on GFS1.  (I've heard reports that GFS2 is/will
> > be better.)
> >
> The latest gfs1 is also much better. It is a tricky thing to do
> efficiently, and not doing the stats is a good plan.
>
> > - You want a fast, reliable low-latency network for your cluster.  Intel
> > GigE cards and a fast switch are a good bet.
> >
> > - Unless your application needs access times or quota support, mounting
> > with "noquota,noatime" is a good idea.  Maybe also "nodiratime".
> >
> > > Can anyone recommend any GFS tunables to help us out here ?
> >
> > You could try bumping demote_secs up from its default of 5 minutes.
> > That'll cause locks to be held longer so they may not need to be
> > reacquired so often.  It won't help with the initial directory listing,
> > but should help on subsequent invocations.
> >
> > In your case, with "ls" taking 8 minutes to run, some locks initially
> > acuired during execution of the command have already been demoted once
> > complete.
> >
> Also the question to ask is how many nodes are accessing this
> filesystem? If more than one node is accessing the same directory and at
> least one of those does a write (i.e. inode create/delete) within the
> demote_secs time, then the demote_secs time will not make much
> difference since the locks will be pushed out by the other node's access
> anyway.

We all 4 nodes in our test env and 5 in our prod env.
The directory structure is as follows:

[root@finapp4 ~]# cd /apps/prod/prodcomn/admin/
[root@finapp4 admin]# ls
inbound  install  log  out  outbound  scripts  trace
[root@finapp4 admin]# ls log/ out/
log/:
PROD_finapp1  PROD_finapp2  PROD_finapp3  PROD_finapp4  PROD_finapp5  WFSC_oracleprod

out/:
o14679499.out  o14798714.out  PROD_finapp2  PROD_finapp4  WFSC_oracleprod
o14698655.out  PROD_finapp1   PROD_finapp3  PROD_finapp5

The WFSC_oracleprod dirs in both the log/ and the out/ directories each contain over 120,000 small files.
This WFSC_oracleprod dir will be accessed by all cluster members for both read and write operations.
If it help to make it any clearer these servers are clustered Oracle Applications servers running concurrent managers.

> > > Should we set statfs_fast to 1 ?
> >
> > Probably good to set this, regardless.
> >
> > > What about glock_purge ?
> >
> > Glock_purge helps limit CPU time consumed by gfs_scand when a large
> > number of unused glocks are present.  See
> > http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4
> > .  This may make your system run better but I'm not sure it's going to
> > help with listing your giant directories.
> >
> Better to disable this altogether unless there is a very good reason to
> use it. It generally has the effect of pushing things out of cache early
> so is to be avoided.
>
> > > Here is the fstab entry for the GFS filesystem:
> > > /dev/vggfs/lvol00       /apps                   gfs
> > > _netdev         1 2
> >
> > Try "noatime,noquota" here.

We also the the Oracle DB server accessing the GFS /apps directory from one of the Oracle Application servers via NFS, which I reckon is not helping
performance. I'm trying to get the DBA's to give me a list of directories to export instead of exporting the whole /apps partition.

Doing testing I can set statfs_fast to 1 and it makes no difference at all on an ls of any of the WFSC_oraclprod directories.

I am making tuning changes 1 at a time and seeing what happens ...

This really does seem to be harder than it should be.

Thanks
Nick.
#

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster