On Mon, Aug 26, 2024 at 09:27:24PM -0700, Darrick J. Wong wrote: > On Tue, Aug 27, 2024 at 11:29:40AM +1000, Dave Chinner wrote: > > On Mon, Aug 26, 2024 at 11:27:34AM -0700, Darrick J. Wong wrote: > > > On Mon, Aug 26, 2024 at 12:06:58PM +1000, Dave Chinner wrote: > > > > On Thu, Aug 22, 2024 at 05:20:07PM -0700, Darrick J. Wong wrote: > > > > > From: Christoph Hellwig <hch@xxxxxx> > > > > > > > > > > Split the check that the rtsummary fits into the log into a separate > > > > > helper, and use xfs_growfs_rt_alloc_fake_mount to calculate the new RT > > > > > geometry. > > > > > > > > > > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > > > > > Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > > [djwong: avoid division for the 0-rtx growfs check] > > > > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > > --- > > > > > fs/xfs/xfs_rtalloc.c | 43 +++++++++++++++++++++++++++++-------------- > > > > > 1 file changed, 29 insertions(+), 14 deletions(-) > > > > > > > > > > > > > > > diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c > > > > > index 61231b1dc4b79..78a3879ad6193 100644 > > > > > --- a/fs/xfs/xfs_rtalloc.c > > > > > +++ b/fs/xfs/xfs_rtalloc.c > > > > > @@ -1023,6 +1023,31 @@ xfs_growfs_rtg( > > > > > return error; > > > > > } > > > > > > > > > > +static int > > > > > +xfs_growfs_check_rtgeom( > > > > > + const struct xfs_mount *mp, > > > > > + xfs_rfsblock_t rblocks, > > > > > + xfs_extlen_t rextsize) > > > > > +{ > > > > > + struct xfs_mount *nmp; > > > > > + int error = 0; > > > > > + > > > > > + nmp = xfs_growfs_rt_alloc_fake_mount(mp, rblocks, rextsize); > > > > > + if (!nmp) > > > > > + return -ENOMEM; > > > > > + > > > > > + /* > > > > > + * New summary size can't be more than half the size of the log. This > > > > > + * prevents us from getting a log overflow, since we'll log basically > > > > > + * the whole summary file at once. > > > > > + */ > > > > > + if (nmp->m_rsumblocks > (mp->m_sb.sb_logblocks >> 1)) > > > > > + error = -EINVAL; > > > > > > > > FWIW, the new size needs to be smaller than that, because the "half > > > > the log size" must to include all the log metadata needed to > > > > encapsulate that object. The grwofs transaction also logs inodes and > > > > the superblock, so that also takes away from the maximum size of > > > > the summary file.... > > > > > > <shrug> It's the same logic as what's there now, and there haven't been > > > any bug reports, have there? > > > > No, none that I know of - it was just an observation that the code > > doesn't actually guarantee what the comment says it should do. > > > > > Though I suppose that's just a reduction > > > of what? One block for the rtbitmap, and (conservatively) two inodes > > > and a superblock? > > > > The rtbitmap update might touch a lot more than one block. The newly > > allocated space in the rtbitmap inode is initialised to zeros, and > > so the xfs_rtfree_range() call from the growfs code to mark the new > > space free has to write all 1s to that range of the rtbitmap. This > > is all done in a single transaction, so we might actually be logging > > a *lot* of rtbitmap buffers here. > > > > IIRC, there is a bit per rtextent, so in a 4kB buffer we can mark > > 32768 rtextents free. If they are 4kB each, then that's 128MB of > > space tracked per rtbitmap block. This adds up to roughly 3.5MB of > > log space for the rtbitmap updates per TB of grown rtdev space.... > > > > So, yeah, I think that calculation and comment is inaccurate, but we > > don't have to fix this right now. > > The kernel only "frees" the new space one rbmblock at a time, so I think > that's why this calculation has never misfired. not quite. It iterates all the rbmblocks in the given range (i.e. the entire extent being freed) in xfs_rtmodify_range() in a single transaction, but... > I /think/ that means > that each transaction only ends up logging two rtsummary blocks at a > time? One to decrement a counter, and again to increment one another > level up? ... we only do one update of the summary blocks per extent being freed (i.e. in xfs_rtfree_range() after the call to xfs_rtmodify_range()). So, yes, we should only end up logging two rtsummary blocks per extent being freed, but the number of rbmblocks logged in that same transaction is O(extent length). -Dave. -- Dave Chinner david@xxxxxxxxxxxxx