Re: DAX 2MB mappings for XFS

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Fri, 12 Jan 2018 15:52:55 -0800

On Fri, Jan 12, 2018 at 11:15:00PM +0000, Kani, Toshi wrote:
> On Sat, 2018-01-13 at 09:27 +1100, Dave Chinner wrote:
> > On Fri, Jan 12, 2018 at 09:38:22PM +0000, Kani, Toshi wrote:
> > > On Sat, 2018-01-13 at 08:19 +1100, Dave Chinner wrote:
> > >  :
> > > > IOWs, what you are seeing is trying to do a very large allocation on
> > > > a very small (8GB) XFS filesystem.  It's rare someone asks to
> > > > allocate >25% of the filesystem space in one allocation, so it's not
> > > > surprising it triggers ENOSPC-like algorithms because it doesn't fit
> > > > into a single AG....
> > > > 
> > > > We can probably look to optimise this, but I'm not sure if we can
> > > > easily differentiate this case (i.e. allocation request larger than
> > > > continguous free space) from the same situation near ENOSPC when we
> > > > really do have to trim to fit...
> > > > 
> > > > Remember: stripe unit allocation alignment is a hint in XFS that we
> > > > can and do ignore when necessary - it's not a binding rule.
> > > 
> > > Thanks for the clarification!  Can XFS allocate smaller extents so that
> > > each extent will fit to an AG?
> > 
> > I've already answered that question:
> > 
> > 	I'm not sure if we can easily differentiate this case (i.e.
> > 	allocation request larger than continguous free space) from
> > 	the same situation near ENOSPC when we really do have to
> > 	trim to fit...
> 
> Right.  I was thinking to limit the extent size (i.e. a half or quarter
> of AG size) regardless of the ENOSPC condition, but it may be the same
> thing.
> 
> > > ext4 creates multiple smaller extents for the same request.
> > 
> > Yes, because it has much, much smaller block groups so "allocation >
> > max extent size (128MB)" is a common path.
> > 
> > It's not a common path on XFS - filesystems (and hence AGs) are
> > typically orders of magnitude larger than the maximum extent size
> > (8GB) so the problem only shows up when we're near ENOSPC. XFS is
> > really not optimised for tiny filesystems, and when it comes to pmem
> > we were lead to beleive we'd have mutliple terabytes of pmem in
> > systems by now, not still be stuck with 8GB NVDIMMS. Hence we've
> > spent very little time worrying about such issues because we
> > weren't aiming to support such small capcities for very long...
> 
> I see.  Yes, there will be multiple terabytes capacity, but it will also
> allow to divide it into multiple smaller namespaces.  So, user may
> continue to have relatively smaller namespaces for their use cases.  If
> user allocates a namespace that is just big enough to host several
> active files, it may hit this issue regardless of their size.

I am curious, why not just give XFS all the space and let it manage the space?

--D

> Thanks,
> -Toshi