Re: Fragmentation Issue We Are Having

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 17 Apr 2012 10:26:10 +1000

On Fri, Apr 13, 2012 at 09:17:25AM +0100, Brian Candler wrote:
> On Fri, Apr 13, 2012 at 05:56:34PM +1000, Dave Chinner wrote:
> > In some cases.
> > 
> > You can't just blindly assert that something is needed purely on
> > the size of the filesystem.
> 
> Thanks, but then perhaps the XFS FAQ needs updating. It warns that you might
> have compatibility problems with old clients (NFS) and inode64, but it
> doesn't say "for some workloads inode32 may perform better than inode64 on
> large filesystems".

The FAQ doesn't say anything about whether inode32 performs better
than inode64 or vice versa. All it talks about is inode allocation
locality and possible errors (like ENOSPC with lots of free space)
that can occur with inode32.

> Also, aren't these orthogonal features?
> 
> (1) "I want all my inode metadata stored at the front of the disk"
> 
> (2) "I want files in the same directory to be distributed between AGs, not
>     stored in the same AG"
> 
> If there are not explicit knobs for these behaviours, then it seems almost
> accidental that limiting yourself to 32-bit inode numbers causes them to
> happen (an implementation artefact).

The behaviour of inode32 was defined long before anyone who
currently works on XFS had any say in the matter. Most people really
consider it a nasty hack that was done to avoid needing to make NFS
clients 64 bit inode number clean back in 1998. At the time it was
unpopular, but considered the least worst solution to the problem.
The biggest issue was that it was made the default mount option...

It's now a historical artifact, and all we are doing is preserving
the behaviour of the allocation policies because there are plenty of
applications out there that rely on the way inode32 or inode64
behaves to acheive their performance..

> Finally, what happens if you have a filesystem smaller than 1TB? I imagine
> that XFS will scale the agsize down so that you have multiple AGs, but will
> still have 32-bit inode numbers - so you will get the same behaviour as
> inode64 on a large filesystem.  What happens then if your workload requires
> behaviour (1) and/or (2) above for optimal performance?

Then you get to choose the least worse option.

Making allocation policy more flexible is something that I've been
wanting to do for years - it was something I was working on when I
left SGI almost 4 years ago (along with metadata checksums). Here's
the patchset of what I'd written from that time:

http://oss.sgi.com/archives/xfs/2009-02/msg00250.html

You're more than welcome to pick it up and start working on it again
so that we can have a much more flexible allocation subsystem if you
want....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs