Re: [PATCH 1/4] xfs: fix bogus minleft manipulations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 20, 2016 at 09:17:47AM -0500, Brian Foster wrote:
> On Mon, Dec 19, 2016 at 12:38:26PM +0100, Christoph Hellwig wrote:
> > On Thu, Dec 15, 2016 at 09:34:33AM -0500, Brian Foster wrote:
> > > FWIW, I was playing with this a bit more and managed to manufacture a
> > > filesystem layout that this series doesn't handle too well. Emphasis on
> > > "manufactured" because this might not be a likely real world scenario,
> > > but either way the current code handles it fine.
> > 
> > It does, although mostly by accident.  I suspect with an even better
> > manufcatured image you could also drive the current code to it's knees,
> > e.g. only have one single block free in the first few AGs, and then
> > a small number just higher than that in a higher AG.
> > 
> 
> Perhaps, I certainly wouldn't expect the code in current form to be
> perfect. It's hard enough to understand as it is. Just trying to avoid
> regressions and properly scope the required fix...
> 
> > > I've attached a metadump of the offending image. mdestore it, mount and
> > > attempt something like 'dd if=/dev/zero of=/mnt/file' on the root. The
> > > buffered write looks like it's in a livelock, waiting indefinitely for a
> > > writeback cycle that will never complete...
> > 
> > Yeah, that's the loop that keeps going even if it can't allocate any
> > blocks, which seems generally bogus.  But even without that we'd get
> > ENOSPC despite not having a reservations. Which is a little easier to
> > debug, but just as wrong.
> > 
> 
> Indeed.
> 
> > The only good way out I can see is to not hand out any more reservations
> > after we only nave nr_ags * xfs_bmap_worst_indlen(1) available.  I'll
> > see if I can come up with a patch for that.
> 
> Hmm, so the idea is to basically find a way we can infer accurate
> information about the per-AG state at the time blocks are reserved from
> the global pool (i.e., buffered write time) and cut off writes at the
> point we can no longer guarantee at least one AG can satisfy the
> smallest write..?

We already do this for per-AG freelist minimum space requirements.
See XFS_ALLOC_AGFL_RESERVE and the big comment above
xfs_alloc_set_aside().

What's worth noting is that xfs_alloc_set_aside() has a magic number
of "4" added to it, which is supposedly for the bmbt split that
might be needed. This is applied at delalloc space reservation time,
so this would seem to me to be the place to hook into here.

I do see a problem here, though - it's only reserving space for a
single BMBT split from the global free space pool. This is fine for
the AGFL reservations (as they are static and fixed in size), but
maybe this is where we are over-committing the freespace pool...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux