On Tue, May 23, 2023 at 02:25:54PM +0800, Wu Guanghao wrote: > After testing xfs_growfs + fsstress + fault injection, the following stack appeared > when mounting the filesystem: > > [ 149.902032] XFS (loop0): xfs_buf_map_verify: daddr 0x200001 out of range, EOFS 0x200000 > [ 149.902072] WARNING: CPU: 12 PID: 3045 at fs/xfs/xfs_buf.c:535 xfs_buf_get_map+0x5ae/0x650 [xfs] > ... > [ 149.902473] xfs_buf_read_map+0x59/0x330 [xfs] > [ 149.902621] ? xlog_recover_items_pass2+0x55/0xd0 [xfs] > [ 149.902809] xlog_recover_buf_commit_pass2+0xff/0x640 [xfs] > [ 149.902959] ? xlog_recover_items_pass2+0x55/0xd0 [xfs] > [ 149.903104] xlog_recover_items_pass2+0x55/0xd0 [xfs] > [ 149.903247] xlog_recover_commit_trans+0x2e0/0x330 [xfs] > [ 149.903390] xlog_recovery_process_trans+0x8e/0xf0 [xfs] > [ 149.903531] xlog_recover_process_data+0x9c/0x130 [xfs] > [ 149.903687] xlog_do_recovery_pass+0x3cc/0x5d0 [xfs] > [ 149.903843] xlog_do_log_recovery+0x5c/0x80 [xfs] > [ 149.903984] xlog_do_recover+0x33/0x1c0 [xfs] > [ 149.904125] xlog_recover+0xdd/0x190 [xfs] > [ 149.904265] xfs_log_mount+0x125/0x2f0 [xfs] > [ 149.904410] xfs_mountfs+0x41a/0x910 [xfs] > [ 149.904558] ? __pfx_xfs_fstrm_free_func+0x10/0x10 [xfs] > [ 149.904725] xfs_fs_fill_super+0x4b7/0x940 [xfs] > [ 149.904873] ? __pfx_xfs_fs_fill_super+0x10/0x10 [xfs] > [ 149.905016] get_tree_bdev+0x19a/0x280 > [ 149.905020] vfs_get_tree+0x29/0xd0 > [ 149.905023] path_mount+0x69e/0x9b0 > [ 149.905026] do_mount+0x7d/0xa0 > [ 149.905029] __x64_sys_mount+0xdc/0x100 > [ 149.905032] do_syscall_64+0x3e/0x90 > [ 149.905035] entry_SYSCALL_64_after_hwframe+0x72/0xdc > > The trigger process is as follows: > > 1. Growfs size from 0x200000 to 0x300000 > 2. Using the space range of 0x200000~0x300000 > 3. The above operations have only been written to the log area on disk > 4. Fault injection and shutdown filesystem > 5. Mount the filesystem and replay the log about growfs, but only modify the > superblock buffer without modifying the mp->m_sb structure in memory > 6. Continuing the log replay, at this point we are replaying operation 2, then > it was discovered that the blocks used more than mp->m_sb.sb_dblocks > > Therefore, during log replay, if there are any modifications made to the > superblock, we should refresh the information recorded in the mp->m_sb. > > Signed-off-by: Wu Guanghao <wuguanghao3@xxxxxxxxxx> There are a bunch of things we need to re-init post recovery if the superblock contents change during recovery. See xlog_do_recover() - if we are moving the sb log item recovery updates from post-recovery to "at log item recovery", then we need to be moving everything else in xlog_do_recover() here as well. That said.... > --- > fs/xfs/xfs_buf_item_recover.c | 25 +++++++++++++++++++++++++ > 1 file changed, 25 insertions(+) > > diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c > index 43167f543afc..2ac3d2083188 100644 > --- a/fs/xfs/xfs_buf_item_recover.c > +++ b/fs/xfs/xfs_buf_item_recover.c > @@ -22,6 +22,8 @@ > #include "xfs_inode.h" > #include "xfs_dir2.h" > #include "xfs_quota.h" > +#include "xfs_sb.h" > +#include "xfs_ag.h" > > /* > * This is the number of entries in the l_buf_cancel_table used during > @@ -969,6 +971,29 @@ xlog_recover_buf_commit_pass2( > goto out_release; > } else { > xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn); > + /* > + * If the superblock buffer is modified, we also need to modify the > + * content of the mp. > + */ > + if (bp->b_maps[0].bm_bn == XFS_SB_DADDR && bp->b_ops) { > + struct xfs_dsb *sb = bp->b_addr; > + > + bp->b_ops->verify_write(bp); > + error = bp->b_error; > + if (error) > + goto out_release; > + > + if (be32_to_cpu(sb->sb_agcount) > mp->m_sb.sb_agcount) { > + error = xfs_initialize_perag(mp, > + be32_to_cpu(sb->sb_agcount), > + be64_to_cpu(sb->sb_dblocks), > + &mp->m_maxagi); > + if (error) > + goto out_release; > + } > + > + xfs_sb_from_disk(&mp->m_sb, sb); Ok, so what are we supposed to do here if the filesystem was shrunk? How do we guarantee that the mods that the shrink might do beyond the new end of device in the same checkpoint have already been replayed by the time the superblock change of size is replayed here? What if feature bits in the superblock were changed? e.g. we add a feature bit the filesystem doesn't understand? Or we have items for recovery in this checkpoint that are in the log after the superblock that depend on that feature bit not yet having been changed? What if the superblock got re-logged and the feature bit change subsequent objects rely on so gets moved to later in the checkpoint? i.e. there appears to be some important item recovery ordering issues we likely need to address here before we move the in-memory state updates from post-recovery to mid-recovery. I suspect that fixing this problem so that superblock updates also guarantee log recovery ordering is also going to need changing how we update feature and geometry state in the superblock.... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx