Hello! On 2023/9/8 20:43, Zhihao Cheng wrote: > JBD2 makes sure journal data is fallen on fs device by sync_blockdev(), > however, other process could intercept the EIO information from bdev's > mapping, which leads journal recovering successful even EIO occurs during > data written back to fs device. > > We found this problem in our product, iscsi + multipath is chosen for block > device of ext4. Unstable network may trigger kpartx to rescan partitions in > device mapper layer. Detailed process is shown as following: > > mount kpartx irq > jbd2_journal_recover > do_one_pass > memcpy(nbh->b_data, obh->b_data) // copy data to fs dev from journal > mark_buffer_dirty // mark bh dirty > vfs_read > generic_file_read_iter // dio > filemap_write_and_wait_range > __filemap_fdatawrite_range > do_writepages > block_write_full_folio > submit_bh_wbc > >> EIO occurs in disk << > end_buffer_async_write > mark_buffer_write_io_error > mapping_set_error > set_bit(AS_EIO, &mapping->flags) // set! > filemap_check_errors > test_and_clear_bit(AS_EIO, &mapping->flags) // clear! > err2 = sync_blockdev > filemap_write_and_wait > filemap_check_errors > test_and_clear_bit(AS_EIO, &mapping->flags) // false > err2 = 0 > > Filesystem is mounted successfully even data from journal is failed written > into disk, and ext4 could become corrupted. > > Fix it by comparing 'sbi->s_bdev_wb_err' before loading journal and after > loading journal. > > Fetch a reproducer in [Link]. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217888 > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Zhihao Cheng <chengzhihao1@xxxxxxxxxx> > Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx> > --- > v1->v2: Checks wb_err from block device only in ext4. > fs/ext4/super.c | 22 +++++++++++++++------- > 1 file changed, 15 insertions(+), 7 deletions(-) > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index 38217422f938..4dcaad2403be 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -4907,6 +4907,14 @@ static int ext4_load_and_init_journal(struct super_block *sb, > if (err) > return err; > > + err = errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err, > + &sbi->s_bdev_wb_err); > + if (err) { > + ext4_msg(sb, KERN_ERR, "Background error %d when loading journal", > + err); > + goto out; > + } > + This solution cannot solve the problem, because the journal tail is still updated in journal_reset() even if we detect the writeback error and refuse to mount the ext4 filesystem here. So I suppose we have to check the I/O error by jbd2 module itself like v1 does. Thanks, Yi. > if (ext4_has_feature_64bit(sb) && > !jbd2_journal_set_features(EXT4_SB(sb)->s_journal, 0, 0, > JBD2_FEATURE_INCOMPAT_64BIT)) { > @@ -5365,6 +5373,13 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb) > goto failed_mount3a; > } > > + /* > + * Save the original bdev mapping's wb_err value which could be > + * used to detect the metadata async write error. > + */ > + spin_lock_init(&sbi->s_bdev_wb_lock); > + errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err, > + &sbi->s_bdev_wb_err); > err = -EINVAL; > /* > * The first inode we look at is the journal inode. Don't try > @@ -5571,13 +5586,6 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb) > } > #endif /* CONFIG_QUOTA */ > > - /* > - * Save the original bdev mapping's wb_err value which could be > - * used to detect the metadata async write error. > - */ > - spin_lock_init(&sbi->s_bdev_wb_lock); > - errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err, > - &sbi->s_bdev_wb_err); > EXT4_SB(sb)->s_mount_state |= EXT4_ORPHAN_FS; > ext4_orphan_cleanup(sb, es); > EXT4_SB(sb)->s_mount_state &= ~EXT4_ORPHAN_FS; >