Re: Failing XFS memory allocation

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 24 Mar 2016 10:00:02 +1100

On Wed, Mar 23, 2016 at 09:10:59AM -0400, Brian Foster wrote:
> On Wed, Mar 23, 2016 at 02:56:25PM +0200, Nikolay Borisov wrote:
> > On 03/23/2016 02:43 PM, Brian Foster wrote:
> > > On Wed, Mar 23, 2016 at 12:15:42PM +0200, Nikolay Borisov wrote:
> ...
> > > It looks like it's working to add a new extent to the in-core extent
> > > list. If this is the stack associated with the warning message (combined
> > > with the large alloc size), I wonder if there's a fragmentation issue on
> > > the file leading to an excessive number of extents.
> > 
> > Yes this is the stack trace associated.
> > 
> > > 
> > > What does 'xfs_bmap -v /storage/loop/file1' show?
> > 
> > It spews a lot of stuff but here is a summary, more detailed info can be
> > provided if you need it:
> > 
> > xfs_bmap -v /storage/loop/file1 | wc -l
> > 900908
> > xfs_bmap -v /storage/loop/file1 | grep -c hole
> > 94568
> > 
> > Also, what would constitute an "excessive number of extents"?
> > 
> 
> I'm not sure where one would draw the line tbh, it's just a matter of
> having too many extents to the point that it causes problems in terms of
> performance (i.e., reading/modifying the extent list) or such as the
> allocation problem you're running into. As it is, XFS maintains the full
> extent list for an active inode in memory, so that's 800k+ extents that
> it's looking for memory for.
> 
> It looks like that is your problem here. 800k or so extents over 878G
> looks to be about 1MB per extent.

Which I wouldn't call excessive. I use a 1MB extent size hint on all
my VM images as this allows the underlying device to do IOs large
enough to maintain clear to full bandwidth when reading and writing
regions of the underlying image file that are non-contiguous w.r.t.
sequential IO from the guest.

Mind you, it's not until I use ext4 or btrfs in the guests that I
actually see significant increases in extent size. Rule of thumb in
my testing is that if XFs creates 100k extents in the image file,
ext4 will create 500k, and btrfs will create somewhere between 1m
and 5m extents....

i.e. XFS as a guest filesystem gives results in much lower image
file fragmentation that the other options....

As it is, yes, the memory allocation problem is with the in-core
extent tree, and we've known about it for some time. The issue is
that as memory gets fragmented, the top level indirection array
grows too large to be allocated as a contiguous chunk. When this
happens really depends on memory load, uptime and the way the extent
tree is being modified.

I'm working on prototype patches to convert it to an in-memory btree
but they are far from ready at this point. This isn't straight
forward because all the extent management code assumes extents are
kept in a linear array and can be directly indexed by array offset
rather than file offset. I also want to make sure we can demand page
the extent list if necessary, and that also complicates things like
locking, as we currently assume the extent list is either completely
in memory or not in memory at all.

Fundamentally, I don't want to repeat the mistakes ext4 and btrfs
have made with their fine-grained in memory extent trees that are
based on rb-trees (e.g. global locks, shrinkers that don't scale or
consume way too much CPU, excessive memory consumption, etc) and so
solving all aspects of the problem in one go is somewhat complex.
And, of course, there's so much other stuff that needs to be done at
the same time, I cannot find much time to work on it at the
moment...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs