On Fri, Mar 19, 2021 at 10:54:25AM -0400, Brian Foster wrote: > On Fri, Mar 19, 2021 at 12:43:03PM +1100, Dave Chinner wrote: > > On Fri, Mar 19, 2021 at 12:05:06PM +1100, Dave Chinner wrote: > > > On Thu, Mar 18, 2021 at 03:19:01PM -0700, Darrick J. Wong wrote: > > > > TBH I think the COW recovery and the AG block reservation pieces are > > > > prime candidates for throwing at an xfs_pwork workqueue so we can > > > > perform those scans in parallel. > > > > > > As I mentioned on #xfs, I think we only need to do the AG read if we > > > are near enospc. i.e. we can take the entire reservation at mount > > > time (which is fixed per-ag) and only take away the used from the > > > reservation (i.e. return to the free space pool) when we actually > > > access the AGF/AGI the first time. Or when we get a ENOSPC > > > event, which might occur when we try to take the fixed reservation > > > at mount time... > > > > Which leaves the question about when we need to actually do the > > accounting needed to fix the bug Brian is trying to fix. Can that be > > delayed until we read the AGFs or have an ENOSPC event occur? Or > > maybe some other "we are near ENOSPC and haven't read all AGFs yet" > > threshold/trigger? > > > > Technically there isn't a hard requirement to read in any AGFs at mount > time. The tradeoff is that leaves a gap in effectiveness until at least > the majority of allocbt blocks have been accounted for (via perag agf > initialization). The in-core counter simply folds into the reservation > set aside value, so it would just remain at 0 at reservation time and > behave as if the mechanism didn't exist in the first place. The obvious > risk is a user can mount the fs and immediately acquire reservation > without having populated the counter from enough AGs to prevent the > reservation overrun problem. For that reason, I didn't really consider > the "lazy" init approach a suitable fix and hooked onto the (mostly) > preexisting perag res behavior to initialize the appropriate structures > at mount time. > > If that underlying mount time behavior changes, it's not totally clear > to me how that impacts this patch. If the perag res change relies on an > overestimated mount time reservation and a fallback to a hard scan on > -ENOSPC, then I wonder whether the overestimated reservation might > effectively subsume whatever the allocbt set aside might be for that AG. > If so, and the perag init effectively transfers excess reservation back > to free space at the same time allocbt blocks are accounted for (and set > aside from subsequent reservations), perhaps that has a similar net > effect as the current behavior (of initializing the allocbt count at > mount time)..? > > One problem is that might be hard to reason about even with code in > place, let alone right now when the targeted behavior is still > vaporware. OTOH, I suppose that if we do know right now that the perag > res scan will still fall back to mount time scans beyond some low free > space threshold, perhaps it's just a matter of factoring allocbt set > aside into the threshold somehow so that we know the counter will always > be initialized before a user can over reserve blocks. Yeah, that seems reasonable to me. I don't think it's difficult to handle - just set the setaside to maximum at mount time, then as we read in AGFs we replace the maximum setaside for that AG with the actual btree block usage. If we hit ENOSPC, then we can read in the uninitialised pags to reduce the setaside from the maximum to the actual values and return that free space back to the global pool... > As it is, I don't > really have a strong opinion on whether we should try to make this fix > now and preserve it, or otherwise table it and revisit once we know what > the resulting perag res code will look like. Thoughts? It sounds like we have a solid plan to address the AG header access at mount time, adding this code now doesn't make anything worse, nor does it appear to prevent us from fixing the AG header access problem in the future. So I'm happy for this fix to go ahead as it stands. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx