Re: Cluster Project FAQ - GFS tuning section

"Jon Erickson" <erickson.jon@xxxxxxxxx> · Thu, 11 Jan 2007 13:48:38 -0500

Robert,

What version of the gfs_mkfs code were you running to get this?
gfs_mkfs -V produced the following results:

gfs_mkfs 6.1.6 (built May 9 2006 17:48:45)
Copyright (C) Red Hat, Inc.  2004-2005 All rights reserved

Thanks,
Jon

On 1/11/07, Robert Peterson <rpeterso@xxxxxxxxxx> wrote:
Jon Erickson wrote:
> I have a couple of question regarding the Cluster Project FAQ – GFS
> tuning section (http://sources.redhat.com/cluster/faq.html#gfs_tuning).
>
> First:
> -    Use –r 2048 on gfs_mkfs and mkfs.gfs2 for large file systems.
> I noticed that when I used the –r 2048 switch while creating my file
> system it ended up creating the file system with the 256MB resource
> group size.  When I omitted the –r flag the file system was created
> with 2048Mb resource group size.  Is there a problem with the –r flag,
> and does gfs_mkfs dynamically come up with the best resource group
> size based on your file system size?  Another thing I did which ended
> up in a problem was executing the gfs_mkfs command while my current
> GFS file system was mounted.  The command completed successfully but
> when I went into the mount point all the old files and directories
> still showed up.  When I attempted to remove files bad things
> happened…I believe I received invalid metadata blocks error and the
> cluster went into an infinite loop trying to restart the service.  I
> ended up fixing this problem by un-mounting my file system re-creating
> the GFS file system and re-mounting.  This problem was caused by my
> user error, but maybe there should be some sort of check that
> determines whether the file system is currently mounted.
>
> Second:
> -    Break file systems up when huge numbers of file are involved.
> This FAQ states that there is an amount of overhead when dealing with
> lots (millions) of files.  What is a recommended limit of files in a
> file system?  The theoretical limit of 8 exabytes for a file system
> does not seem at all realistic if you can't have (millions) of files
> in a file system.
>
> I just curious to see what everyone thinks about this.  Thanks
>
>
Hi Jon,

The newer gfs_mkfs (gfs1) and mkfs.gfs2 (gfs2) in the CVS HEAD will
create the RG size based on the size of the file system if "-r" is not
specified,
so that would explain why it used 2048 in the case where you didn't
specify -r.
The previous versions just always assumed 256MB unless -r was specified.

If you specified -r 2048 and it used 256 for its rg size, that would be
a bug.
What version of the gfs_mkfs code were you running to get this?

I agree that it would be very nice if all the userspace GFS-related
tools could
make sure the file system is not mounted anywhere first before running.
We even have a bugzilla from long ago about this regarding gfs_fsck:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=156012

It's easy enough to check if the local node (the one running mkfs or fsck)
has it mounted, but it's harder to figure out whether other nodes do because
the userland tools can't assume access to the cluster infrastructure
like the
kernel code can.  So I guess we haven't thought of an elegant solution to
this yet; we almost need to query every node and check its cman_tool
services output to see if it is using resources pertaining to the file
system,
but that would require some kind of socket or connection,
(e.g. ssh) but what should it do when it can't contact a node that's powered
off, etc?

Regarding the number of files in a GFS file system:  I don't have any kind
of recommendations because I haven't studied the exact performance impact
based on the number of inodes.  It would be cool if someone could do some
tests and see where the performance starts to degrade.

The cluster team at Red Hat can work toward improving the performance
of GFS (in fact, we are; hence the change to gfs_mkfs for the rg size),
but many of the performance issues are already addressed with GFS2,
and since GFS2 was accepted by the upstream linux kernel, in a way
I think it makes more sense to focus more of our efforts there.

One thing I thought about doing was trying to use btrees instead of
linked lists for some of our more critical resources, like the RGs and
the glocks.  We'd have to figure out the impact of doing that; the overhead
to do that might also impact performance.  Just my $0.02.

Regards,

Bob Peterson
Red Hat Cluster Suite

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Jon

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster