On 2023/6/14 13:42, Theodore Ts'o wrote: > OK, some more updates. First of all, the e2fsck hang in the ext4/adv > case is an inline_data bug in e2fsck/pass2.c:check_dir_block(); the > code is clearly buggy, and I'll be sending out a fix in the next day > or two. > > I still don't understand why this patch series is changing the kernel > behaviour enough to change the resulting file system in such a way as > to unmask this bug. The bug is triggered by file system corruption, > so the question is whether this patch series is somehow causing the > file system to be more corrupted than it otherwise would be. I'm not > sure. > > However, the ext4/ext3 hang *is* a real hang in the kernel space, and > generic/475 is not completing because the kernel seems to have ended > up deadlocking somehow. With just the first patch in this patch > series ("jbd2: recheck chechpointing non-dirty buffer") we're getting > a kernel NULL pointer derefence: > > [ 310.447568] EXT4-fs error (device dm-7): ext4_check_bdev_write_error:223: comm fsstress: Error while async write back metadata > [ 310.458038] EXT4-fs error (device dm-7): __ext4_get_inode_loc_noinmem:4467: inode #99400: block 393286: comm fsstress: unable to read itable block > [ 310.458421] JBD2: IO error reading journal superblock > [ 310.484755] EXT4-fs warning (device dm-7): ext4_end_bio:343: I/O error 10 writing to inode 36066 starting block 19083) > [ 310.490956] BUG: kernel NULL pointer dereference, address: 0000000000000000 > [ 310.490959] #PF: supervisor write access in kernel mode > [ 310.490961] #PF: error_code(0x0002) - not-present page > [ 310.490963] PGD 0 P4D 0 > [ 310.490966] Oops: 0002 [#1] PREEMPT SMP PTI > [ 310.490970] CPU: 1 PID: 15600 Comm: fsstress Not tainted 6.4.0-rc5-xfstests-00055-gd3ab1bca26b4 #190 > [ 310.490974] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023 > [ 310.490976] RIP: 0010:jbd2_journal_set_features+0x13d/0x430 > [ 310.490985] Code: 0f 94 c0 44 20 e8 0f 85 e0 00 00 00 be 94 01 00 00 48 c7 c7 a1 33 59 b4 48 89 0c 24 4c 8b 7d 38 e8 a8 dc c5 ff 2e 2e 2e 31 c0 <f0> 49 0f ba 2f 02 48 8b 0c 24 0f 82 24 02 00 00 45 84 ed 8b 41 28 > [ 310.490988] RSP: 0018:ffffb9b441043b30 EFLAGS: 00010246 > [ 310.490990] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8edb447b8100 > [ 310.490993] RDX: 0000000000000000 RSI: 0000000000000194 RDI: ffffffffb45933a1 > [ 310.490994] RBP: ffff8edb45a62800 R08: ffffffffb460d6c0 R09: 0000000000000000 > [ 310.490996] R10: 204f49203a324442 R11: 4f49203a3244424a R12: 0000000000000000 > [ 310.490997] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000 > [ 310.490999] FS: 00007f2940cca740(0000) GS:ffff8edc19500000(0000) knlGS:0000000000000000 > [ 310.491005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 310.491007] CR2: 0000000000000000 CR3: 000000012543e003 CR4: 00000000003706e0 > [ 310.491009] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 310.491011] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 310.491012] Call Trace: > [ 310.491016] <TASK> > [ 310.491019] ? __die+0x23/0x60 > [ 310.491025] ? page_fault_oops+0xa4/0x170 > [ 310.491029] ? exc_page_fault+0x67/0x170 > [ 310.491032] ? asm_exc_page_fault+0x26/0x30 > [ 310.491039] ? jbd2_journal_set_features+0x13d/0x430 > [ 310.491043] jbd2_journal_revoke+0x47/0x1e0 > [ 310.491046] __ext4_forget+0xc3/0x1b0 > [ 310.491051] ext4_free_blocks+0x214/0x2f0 > [ 310.491056] ext4_free_branches+0xeb/0x270 > [ 310.491061] ext4_ind_truncate+0x2bf/0x320 > [ 310.491065] ext4_truncate+0x1e4/0x490 > [ 310.491069] ext4_handle_inode_extension+0x1bd/0x2a0 > [ 310.491073] ? iomap_dio_complete+0xaf/0x1d0 > [ 310.511141] ------------[ cut here ]------------ > [ 310.516121] ext4_dio_write_iter+0x346/0x3e0 > [ 310.516132] ? __handle_mm_fault+0x171/0x200 > [ 310.516135] vfs_write+0x21a/0x3e0 > [ 310.516140] ksys_write+0x6f/0xf0 > [ 310.516142] do_syscall_64+0x3b/0x90 > [ 310.516147] entry_SYSCALL_64_after_hwframe+0x72/0xdc > [ 310.516154] RIP: 0033:0x7f2940eb2fb3 > [ 310.516158] Code: 75 05 48 83 c4 58 c3 e8 cb 41 ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18 > [ 310.516161] RSP: 002b:00007ffe9a322cf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > [ 310.516165] RAX: ffffffffffffffda RBX: 0000000000003000 RCX: 00007f2940eb2fb3 > [ 310.516167] RDX: 0000000000003000 RSI: 0000556ba1e31000 RDI: 0000000000000003 > [ 310.516168] RBP: 0000000000000003 R08: 0000556ba1e31000 R09: 00007f2940e9bbe0 > [ 310.516170] R10: 0000556b9fedbf59 R11: 0000000000000246 R12: 0000000000000024 > [ 310.516172] R13: 00000000000cf000 R14: 0000556ba1e31000 R15: 0000000000000000 > [ 310.516174] </TASK> > [ 310.516178] CR2: 0000000000000000 > [ 310.516181] ---[ end trace 0000000000000000 ]--- > Sorry about the regression, I found that this issue is not introduced by the first patch in this patch series ("jbd2: recheck chechpointing non-dirty buffer"), is d9eafe0afafa ("jbd2: factor out journal initialization from journal_get_superblock()") [1] on your dev branch. The problem is the journal super block had been failed to write out due to IO fault, it's uptodate bit was cleared by end_buffer_write_syn() and didn't reset yet in jbd2_write_superblock(). And it raced by jbd2_journal_revoke()->jbd2_journal_set_features()-> jbd2_journal_check_used_features()->journal_get_superblock()->bh_read(), unfortunately, the read IO is also fail, so the error handling in journal_fail_superblock() clear the journal->j_sb_buffer, finally lead to above NULL pointer dereference issue. I think the fix could be just move buffer_verified(bh) in front of bh_read(). I can send out the fix after tests. [1] https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=d9eafe0afafaa519953735498c2a065d223c519b Thanks, Yi. > This is then causing fsstress to wedge: > > # ps -ax -o pid,user,wchan:20,args --sort pid > PID USER WCHAN COMMAND > ... > 12860 root do_wait /bin/bash /root/xfstests/tests/generic/475 > 13086 root rescuer_thread [kdmflush/253:7] > 15593 root rescuer_thread [ext4-rsv-conver] > 15598 root jbd2_log_wait_commit ./ltp/fsstress -d /xt-vdc -n 999999 -p 4 > 15600 root ext4_release_file [fsstress] > 15601 root exit_aio [fsstress] > > So at this point, I'm going to drop this entire patch series from the > dev tree, since this *does* seem to be some kind of regression > triggered by the first patch in the patch series. > > Regards, > > - Ted >