On Thu, Jun 01, 2023 at 03:55:53PM +0800, Long Li wrote: > Hi, Darrick > > I realized there may be a problem with my email so you may not receive my > previous reply. So I'm replying to you with another email address, and if > you received my previous reply, you can ignore this email. Thank you very much! Oh, I saw it, but I've been on vacation for a week and hadn't responded to it (or much anything else) ... > On Wed, May 24, 2023 at 07:48:00AM -0700, Darrick J. Wong wrote: > > On Wed, May 24, 2023 at 08:10:41PM +0800, Long Li wrote: > > > I found a corruption during growfs: > > > > > > XFS (loop0): Internal error agbno >= mp->m_sb.sb_agblocks at line 3661 of > > > file fs/xfs/libxfs/xfs_alloc.c. Caller __xfs_free_extent+0x28e/0x3c0 > > > CPU: 0 PID: 573 Comm: xfs_growfs Not tainted 6.3.0-rc7-next-20230420-00001-gda8c95746257 > > > Call Trace: > > > <TASK> > > > dump_stack_lvl+0x50/0x70 > > > xfs_corruption_error+0x134/0x150 > > > __xfs_free_extent+0x2c1/0x3c0 > > > xfs_ag_extend_space+0x291/0x3e0 > > > xfs_growfs_data+0xd72/0xe90 > > > xfs_file_ioctl+0x5f9/0x14a0 > > > __x64_sys_ioctl+0x13e/0x1c0 > > > do_syscall_64+0x39/0x80 > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd > > > XFS (loop0): Corruption detected. Unmount and run xfs_repair > > > XFS (loop0): Internal error xfs_trans_cancel at line 1097 of file > > > fs/xfs/xfs_trans.c. Caller xfs_growfs_data+0x691/0xe90 > > > CPU: 0 PID: 573 Comm: xfs_growfs Not tainted 6.3.0-rc7-next-20230420-00001-gda8c95746257 > > > Call Trace: > > > <TASK> > > > dump_stack_lvl+0x50/0x70 > > > xfs_error_report+0x93/0xc0 > > > xfs_trans_cancel+0x2c0/0x350 > > > xfs_growfs_data+0x691/0xe90 > > > xfs_file_ioctl+0x5f9/0x14a0 > > > __x64_sys_ioctl+0x13e/0x1c0 > > > do_syscall_64+0x39/0x80 > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd > > > RIP: 0033:0x7f2d86706577 > > > > > > The bug can be reproduced with the following sequence: > > > > > > # truncate -s 1073741824 xfs_test.img > > > # mkfs.xfs -f -b size=1024 -d agcount=4 xfs_test.img > > > # truncate -s 2305843009213693952 xfs_test.img > > > # mount -o loop xfs_test.img /mnt/test > > > # xfs_growfs -D 1125899907891200 /mnt/test > > > > > > The root cause is that during growfs, user space passed in a large value > > > of newblcoks to xfs_growfs_data_private(), due to current sb_agblocks is > > > too small, new AG count will exceed UINT_MAX. Because of AG number type > > > is unsigned int and it would overflow, that caused nagcount much smaller > > > than the actual value. During AG extent space, delta blocks in > > > xfs_resizefs_init_new_ags() will much larger than the actual value due to > > > incorrect nagcount, even exceed UINT_MAX. This will cause corruption and > > > be detected in __xfs_free_extent. Fix it by growing the filesystem to up > > > to the maximally allowed AGs when new AG count overflow. > > > > > > Signed-off-by: Long Li <leo.lilong@xxxxxxxxxx> > > > --- > > > v3: > > > - Ensure that the performance is consisent before and after the modification > > > when nagcount just overflows and 0 < nb_mod < XFS_MIN_AG_BLOCKS. > > > - Based on Darrick's advice, growing the filesystem to up to the maximally > > > allowed AGs when new AG count overflow. > > > > > > fs/xfs/libxfs/xfs_fs.h | 3 +++ > > > fs/xfs/xfs_fsops.c | 13 +++++++++---- > > > 2 files changed, 12 insertions(+), 4 deletions(-) > > > > > > diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h > > > index 1cfd5bc6520a..36ec2b1f7e68 100644 > > > --- a/fs/xfs/libxfs/xfs_fs.h > > > +++ b/fs/xfs/libxfs/xfs_fs.h > > > @@ -248,6 +248,9 @@ typedef struct xfs_fsop_resblks { > > > #define XFS_MAX_LOG_BLOCKS (1024 * 1024ULL) > > > #define XFS_MIN_LOG_BYTES (10 * 1024 * 1024ULL) > > > > > > +/* Maximum number of AGs */ > > > +#define XFS_AGNUMBER_MAX ((xfs_agnumber_t)(-1U)) > > > > My apologies, I led you astray. This should be: > > > > #define XFS_MAX_AGNUMBER ((xfs_agnumber_t)(NULLAGNUMBER - 1)) > > OK, it is undoubtedly better to follow the definition in xfsprogs, which will > be modified in the next version. > > > > > since it already exists in multidisk.h from xfsprogs. > > xfs_validate_sb_common probably ought to validate that > > sb_agcount <= XFS_MAX_AGNUMBER as well. > > In my understanding the range of sb_agcount should be [1, 0xFFFFFFFF], it > should validate that sb_agcount <= (XFS_MAX_AGNUMBER + 1) , so it's not necessary > to check the upper limit of the sb_agcount. Yes, you're correct, no new validation is needed. --D > > Thanks, > Long Li > > > > > > + > > > /* > > > * Limits on sb_agblocks/sb_agblklog -- mkfs won't format AGs smaller than > > > * 16MB or larger than 1TB. > > > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c > > > index 13851c0d640b..2b37d3e61bdf 100644 > > > --- a/fs/xfs/xfs_fsops.c > > > +++ b/fs/xfs/xfs_fsops.c > > > @@ -115,11 +115,16 @@ xfs_growfs_data_private( > > > > > > nb_div = nb; > > > nb_mod = do_div(nb_div, mp->m_sb.sb_agblocks); > > > - nagcount = nb_div + (nb_mod != 0); > > > - if (nb_mod && nb_mod < XFS_MIN_AG_BLOCKS) { > > > - nagcount--; > > > - nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks; > > > + if (nb_mod && nb_mod >= XFS_MIN_AG_BLOCKS) > > > + nb_div++; > > > + else if (nb_mod) > > > + nb = nb_div * mp->m_sb.sb_agblocks; > > > + > > > + if (nb_div > XFS_AGNUMBER_MAX) { > > > + nb_div = XFS_AGNUMBER_MAX; > > > + nb = nb_div * mp->m_sb.sb_agblocks; > > > } > > > + nagcount = nb_div; > > > delta = nb - mp->m_sb.sb_dblocks; > > > /* > > > * Reject filesystems with a single AG because they are not > > > -- > > > 2.31.1 > > > >