On Wed, Mar 24, 2021 at 09:40:36AM +1100, Dave Chinner wrote: > On Fri, Mar 19, 2021 at 10:54:25AM -0400, Brian Foster wrote: > > On Fri, Mar 19, 2021 at 12:43:03PM +1100, Dave Chinner wrote: > > > On Fri, Mar 19, 2021 at 12:05:06PM +1100, Dave Chinner wrote: > > > > On Thu, Mar 18, 2021 at 03:19:01PM -0700, Darrick J. Wong wrote: > > > > > TBH I think the COW recovery and the AG block reservation pieces are > > > > > prime candidates for throwing at an xfs_pwork workqueue so we can > > > > > perform those scans in parallel. > > > > > > > > As I mentioned on #xfs, I think we only need to do the AG read if we > > > > are near enospc. i.e. we can take the entire reservation at mount > > > > time (which is fixed per-ag) and only take away the used from the > > > > reservation (i.e. return to the free space pool) when we actually > > > > access the AGF/AGI the first time. Or when we get a ENOSPC > > > > event, which might occur when we try to take the fixed reservation > > > > at mount time... > > > > > > Which leaves the question about when we need to actually do the > > > accounting needed to fix the bug Brian is trying to fix. Can that be > > > delayed until we read the AGFs or have an ENOSPC event occur? Or > > > maybe some other "we are near ENOSPC and haven't read all AGFs yet" > > > threshold/trigger? > > > > > > > Technically there isn't a hard requirement to read in any AGFs at mount > > time. The tradeoff is that leaves a gap in effectiveness until at least > > the majority of allocbt blocks have been accounted for (via perag agf > > initialization). The in-core counter simply folds into the reservation > > set aside value, so it would just remain at 0 at reservation time and > > behave as if the mechanism didn't exist in the first place. The obvious > > risk is a user can mount the fs and immediately acquire reservation > > without having populated the counter from enough AGs to prevent the > > reservation overrun problem. For that reason, I didn't really consider > > the "lazy" init approach a suitable fix and hooked onto the (mostly) > > preexisting perag res behavior to initialize the appropriate structures > > at mount time. > > > > If that underlying mount time behavior changes, it's not totally clear > > to me how that impacts this patch. If the perag res change relies on an > > overestimated mount time reservation and a fallback to a hard scan on > > -ENOSPC, then I wonder whether the overestimated reservation might > > effectively subsume whatever the allocbt set aside might be for that AG. > > If so, and the perag init effectively transfers excess reservation back > > to free space at the same time allocbt blocks are accounted for (and set > > aside from subsequent reservations), perhaps that has a similar net > > effect as the current behavior (of initializing the allocbt count at > > mount time)..? > > > > One problem is that might be hard to reason about even with code in > > place, let alone right now when the targeted behavior is still > > vaporware. OTOH, I suppose that if we do know right now that the perag > > res scan will still fall back to mount time scans beyond some low free > > space threshold, perhaps it's just a matter of factoring allocbt set > > aside into the threshold somehow so that we know the counter will always > > be initialized before a user can over reserve blocks. > > Yeah, that seems reasonable to me. I don't think it's difficult to > handle - just set the setaside to maximum at mount time, then as we > read in AGFs we replace the maximum setaside for that AG with the > actual btree block usage. If we hit ENOSPC, then we can read in the > uninitialised pags to reduce the setaside from the maximum to the > actual values and return that free space back to the global pool... > Ack. That seems like a generic enough fallback plan if the overestimation of perag reservation doesn't otherwise cover the gap. > > As it is, I don't > > really have a strong opinion on whether we should try to make this fix > > now and preserve it, or otherwise table it and revisit once we know what > > the resulting perag res code will look like. Thoughts? > > It sounds like we have a solid plan to address the AG header access > at mount time, adding this code now doesn't make anything worse, > nor does it appear to prevent us from fixing the AG header access > problem in the future. So I'm happy for this fix to go ahead as it > stands. > Ok, so is that a Rv-b..? ;) So far after a quick skim back through the discussion, I don't have a reason for a v4 of this series... Brian > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx >