On Thu, Mar 17, 2022 at 08:56:55AM -0400, Brian Foster wrote: > On Thu, Mar 17, 2022 at 01:05:26PM +1100, Dave Chinner wrote: > > On Wed, Mar 16, 2022 at 11:17:26AM -0700, Darrick J. Wong wrote: > > > On Wed, Mar 16, 2022 at 01:29:01PM -0400, Brian Foster wrote: > > > > On Wed, Mar 16, 2022 at 09:32:16AM -0700, Darrick J. Wong wrote: > > > > > On Wed, Mar 16, 2022 at 07:28:18AM -0400, Brian Foster wrote: > > > > > > On Mon, Mar 14, 2022 at 11:08:47AM -0700, Darrick J. Wong wrote: > > > > Similar deal as above.. I'm more interested in a potential cleanup of > > > > the code that helps prevent this sort of buglet for the next user of > > > > ->m_alloc_set_aside that will (expectedly) have no idea about this > > > > subtle quirk than I am about what's presented in the free space > > > > counters. ISTM that we ought to just ditch ->m_alloc_set_aside, replace > > > > the existing xfs_alloc_set_aside() with an XFS_ALLOC_FS_RESERVED() macro > > > > or something that just does the (agcount << 3) thing, and then define a > > > > > > I'm not sure that the current xfs_alloc_set_aside code is correct. > > > Right now it comes with this comment: > > > > > > "We need to reserve 4 fsbs _per AG_ for the freelist and 4 more to > > > handle a potential split of the file's bmap btree." > > > > > > I think the first part ("4 fsbs _per AG_ for the freelist") is wrong. > > > AFAICT, that part refers to the number of blocks we need to keep free in > > > case we have to replenish a completely empty AGFL. The hardcoded value > > > of 4 seems wrong, since xfs_alloc_min_freelist() is what _fix_freelist > > > uses to decide how big the AGFL needs to be, and it returns 6 on a > > > filesystem that has rmapbt enabled. So I think XFS_ALLOC_AGFL_RESERVE > > > is wrong here and should be replaced with the function call. > > > > Back when I wrote that code (circa 2007, IIRC), that was actually > > correct according to the reservations that were made when freeing > > an extent at ENOSPC. > > > > We needed 4 blocks for the AGFL fixup to always succeed - 2 blocks > > for each BNO and CNT btrees, and, IIRC, the extent free reservation > > was just 4 blocks at that time. Hence the 4+4 value. > > > > However, you are right that rmap also adds another per-ag btree that > > is allocated from the agfl and that set_aside() should be taking > > that into accout. That said, I think that xfs_alloc_min_freelist() > > might even be wrong by just adding 2 blocks to the AGFL for the > > rmapbt. > > > > That is, at ENOSPC the rmapbt can be a *big* btree. It's not like > > the BNO and CNT btrees which are completely empty at that point in > > time; the RMAP tree could be one level below max height, and freeing > > a single block could split a rmap rec and trigger a full height RMAP > > split. > > > > So the minimum free list length in that case is 2 + 2 + MAX_RMAP_HEIGHT. > > > > > I also think the second part ("and 4 more to handle a split of the > > > file's bmap btree") is wrong. If we're really supposed to save enough > > > blocks to handle a bmbt split, then I think this ought to be > > > (mp->m_bm_maxlevels[0] - 1), not 4, right? According to xfs_db, bmap > > > btrees can be 9 levels tall: > > > > Yes, we've changed the BMBT reservations in the years since that > > code was written to handle max height reservations correctly, too. > > So, like the RMAP btree reservation, we probably should be reserving > > MAX_BMAP_HEIGHT in the set-aside calculation. > > > > refcount btree space is handled by the ag_resv code and blocks > > aren't allocated from the AGFL, so I don't think we need to take > > taht into account for xfs_alloc_set_aside(). > > > > > So in the end, I think that calculation should become: > > > > > > unsigned int > > > xfs_alloc_set_aside( > > > struct xfs_mount *mp) > > > { > > > unsigned int min-agfl = xfs_alloc_min_freelist(mp, NULL); > > > > > > return mp->m_sb.sb_agcount * (min_agfl + mp->m_bm_maxlevels[0] - 1); > > > } > > > > *nod*, but with the proviso that xfs_alloc_min_freelist() doesn't > > appear to be correct, either.... > > > > Also, that's a fixed value for the physical geometry of the > > filesystem, so it should be calculated once at mount time and stored > > in the xfs_mount (and only updated if needed at growfs time)... > > > > To my earlier point... please just don't call this fixed mount value > "set_aside" if that's not what it actually is. Rename the field and > helper to something self-descriptive based on whatever fixed components > it's made up of (you could even qualify it as a subcomponent of > set_aside with something like ".._agfl_bmap_set_aside" or whatever) then > reserve the _set_aside() name for the helper that calculates and > documents what the actual/final/dynamic "set aside" value is. Ahh, ok, that's what you were getting at. Every time I look at "alloc_set_aside" I have to figure out what that /really/ does, and I think "agfl_bmap_setaside" (or even "bmbt_split_setaside") is a better hint to what this actually does. --D > Brian > > > > > new xfs_alloc_set_aside() that combines the macro calculation with > > > > ->m_allocbt_blks. Then the whole "set aside" concept is calculated and > > > > documented in one place. Hm? > > > > > > I think I'd rather call the new function xfs_fdblocks_avail() over > > > reusing an existing name, because I fear that zapping an old function > > > and replacing it with a new function with the same name will cause > > > confusion for anyone backporting patches or reading code after an > > > absence. > > > > > > Also the only reason we have a mount variable and a function (instead of > > > a macro) is that Dave asked me to change the codebase away from the > > > XFS_ALLOC_AG_MAX_USABLE/XFS_ALLOC_SET_ASIDE macros as part of merging > > > reflink. > > > > Yeah, macros wrapping a variable or repeated constant calculation > > are bad, and it's something we've been cleaning up for a long > > time... > > > > Cheers, > > > > Dave. > > -- > > Dave Chinner > > david@xxxxxxxxxxxxx > > >