On Mon 13-03-23 21:20:17, Zhihao Cheng wrote: > Following process makes ext4 load stale buffer heads from last failed > mounting in a new mounting operation: > mount_bdev > ext4_fill_super > | ext4_load_and_init_journal > | ext4_load_journal > | jbd2_journal_load > | load_superblock > | journal_get_superblock > | set_buffer_verified(bh) // buffer head is verified > | jbd2_journal_recover // failed caused by EIO > | goto failed_mount3a // skip 'sb->s_root' initialization > deactivate_locked_super > kill_block_super > generic_shutdown_super > if (sb->s_root) > // false, skip ext4_put_super->invalidate_bdev-> > // invalidate_mapping_pages->mapping_evict_folio-> > // filemap_release_folio->try_to_free_buffers, which > // cannot drop buffer head. > blkdev_put > blkdev_put_whole > if (atomic_dec_and_test(&bdev->bd_openers)) > // false, systemd-udev happens to open the device. Then > // blkdev_flush_mapping->kill_bdev->truncate_inode_pages-> > // truncate_inode_folio->truncate_cleanup_folio-> > // folio_invalidate->block_invalidate_folio-> > // filemap_release_folio->try_to_free_buffers will be skipped, > // dropping buffer head is missed again. > > Second mount: > ext4_fill_super > ext4_load_and_init_journal > ext4_load_journal > ext4_get_journal > jbd2_journal_init_inode > journal_init_common > bh = getblk_unmovable > bh = __find_get_block // Found stale bh in last failed mounting > journal->j_sb_buffer = bh > jbd2_journal_load > load_superblock > journal_get_superblock > if (buffer_verified(bh)) > // true, skip journal->j_format_version = 2, value is 0 > jbd2_journal_recover > do_one_pass > next_log_block += count_tags(journal, bh) > // According to journal_tag_bytes(), 'tag_bytes' calculating is > // affected by jbd2_has_feature_csum3(), jbd2_has_feature_csum3() > // returns false because 'j->j_format_version >= 2' is not true, > // then we get wrong next_log_block. The do_one_pass may exit > // early whenoccuring non JBD2_MAGIC_NUMBER in 'next_log_block'. > > The filesystem is corrupted here, journal is partially replayed, and > new journal sequence number actually is already used by last mounting. > > The invalidate_bdev() can drop all buffer heads even racing with bare > reading block device(eg. systemd-udev), so we can fix it by invalidating > bdev in error handling path in __ext4_fill_super(). > > Fetch a reproducer in [Link]. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217171 > Fixes: 25ed6e8a54df ("jbd2: enable journal clients to enable v2 checksumming") > Cc: stable@xxxxxxxxxxxxxxx # v3.5 > Signed-off-by: Zhihao Cheng <chengzhihao1@xxxxxxxxxx> ... > @@ -1271,14 +1277,8 @@ static void ext4_put_super(struct super_block *sb) > > sync_blockdev(sb->s_bdev); > invalidate_bdev(sb->s_bdev); > - if (sbi->s_journal_bdev && sbi->s_journal_bdev != sb->s_bdev) { > - /* > - * Invalidate the journal device's buffers. We don't want them > - * floating about in memory - the physical journal device may > - * hotswapped, and it breaks the `ro-after' testing code. > - */ > + if (sbi->s_journal_bdev) { > sync_blockdev(sbi->s_journal_bdev); > - invalidate_bdev(sbi->s_journal_bdev); > ext4_blkdev_remove(sbi); > } Hum, but this will invalidate bhs only if journal is stored on a block device. If journal is in the inode (the common case), we won't invalidate anything (sbi->s_journal_bdev is NULL) and the same problem can happen? Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR