Re: [PATCH v3 1/2] xfs: set a mount flag when perag reservation is active

Brian Foster <bfoster@xxxxxxxxxx> · Wed, 24 Mar 2021 10:24:03 -0400

On Wed, Mar 24, 2021 at 09:40:36AM +1100, Dave Chinner wrote:
> On Fri, Mar 19, 2021 at 10:54:25AM -0400, Brian Foster wrote:
> > On Fri, Mar 19, 2021 at 12:43:03PM +1100, Dave Chinner wrote:
> > > On Fri, Mar 19, 2021 at 12:05:06PM +1100, Dave Chinner wrote:
> > > > On Thu, Mar 18, 2021 at 03:19:01PM -0700, Darrick J. Wong wrote:
> > > > > TBH I think the COW recovery and the AG block reservation pieces are
> > > > > prime candidates for throwing at an xfs_pwork workqueue so we can
> > > > > perform those scans in parallel.
> > > > 
> > > > As I mentioned on #xfs, I think we only need to do the AG read if we
> > > > are near enospc. i.e. we can take the entire reservation at mount
> > > > time (which is fixed per-ag) and only take away the used from the
> > > > reservation (i.e. return to the free space pool) when we actually
> > > > access the AGF/AGI the first time. Or when we get a ENOSPC
> > > > event, which might occur when we try to take the fixed reservation
> > > > at mount time...
> > > 
> > > Which leaves the question about when we need to actually do the
> > > accounting needed to fix the bug Brian is trying to fix. Can that be
> > > delayed until we read the AGFs or have an ENOSPC event occur? Or
> > > maybe some other "we are near ENOSPC and haven't read all AGFs yet"
> > > threshold/trigger?
> > > 
> > 
> > Technically there isn't a hard requirement to read in any AGFs at mount
> > time. The tradeoff is that leaves a gap in effectiveness until at least
> > the majority of allocbt blocks have been accounted for (via perag agf
> > initialization). The in-core counter simply folds into the reservation
> > set aside value, so it would just remain at 0 at reservation time and
> > behave as if the mechanism didn't exist in the first place. The obvious
> > risk is a user can mount the fs and immediately acquire reservation
> > without having populated the counter from enough AGs to prevent the
> > reservation overrun problem. For that reason, I didn't really consider
> > the "lazy" init approach a suitable fix and hooked onto the (mostly)
> > preexisting perag res behavior to initialize the appropriate structures
> > at mount time.
> > 
> > If that underlying mount time behavior changes, it's not totally clear
> > to me how that impacts this patch. If the perag res change relies on an
> > overestimated mount time reservation and a fallback to a hard scan on
> > -ENOSPC, then I wonder whether the overestimated reservation might
> > effectively subsume whatever the allocbt set aside might be for that AG.
> > If so, and the perag init effectively transfers excess reservation back
> > to free space at the same time allocbt blocks are accounted for (and set
> > aside from subsequent reservations), perhaps that has a similar net
> > effect as the current behavior (of initializing the allocbt count at
> > mount time)..?
> > 
> > One problem is that might be hard to reason about even with code in
> > place, let alone right now when the targeted behavior is still
> > vaporware. OTOH, I suppose that if we do know right now that the perag
> > res scan will still fall back to mount time scans beyond some low free
> > space threshold, perhaps it's just a matter of factoring allocbt set
> > aside into the threshold somehow so that we know the counter will always
> > be initialized before a user can over reserve blocks.
> 
> Yeah, that seems reasonable to me. I don't think it's difficult to
> handle - just set the setaside to maximum at mount time, then as we
> read in AGFs we replace the maximum setaside for that AG with the
> actual btree block usage. If we hit ENOSPC, then we can read in the
> uninitialised pags to reduce the setaside from the maximum to the
> actual values and return that free space back to the global pool...
> 

Ack. That seems like a generic enough fallback plan if the
overestimation of perag reservation doesn't otherwise cover the gap.

> > As it is, I don't
> > really have a strong opinion on whether we should try to make this fix
> > now and preserve it, or otherwise table it and revisit once we know what
> > the resulting perag res code will look like. Thoughts?
> 
> It sounds like we have a solid plan to address the AG header access
> at mount time, adding this code now doesn't make anything worse,
> nor does it appear to prevent us from fixing the AG header access
> problem in the future. So I'm happy for this fix to go ahead as it
> stands.
> 

Ok, so is that a Rv-b..? ;)

So far after a quick skim back through the discussion, I don't have a
reason for a v4 of this series...

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
>