Re: [PATCH v3 4/6] jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint

"Theodore Ts'o" <tytso@xxxxxxx> · Wed, 14 Jun 2023 01:42:22 -0400

OK, some more updates.  First of all, the e2fsck hang in the ext4/adv
case is an inline_data bug in e2fsck/pass2.c:check_dir_block(); the
code is clearly buggy, and I'll be sending out a fix in the next day
or two.

I still don't understand why this patch series is changing the kernel
behaviour enough to change the resulting file system in such a way as
to unmask this bug.  The bug is triggered by file system corruption,
so the question is whether this patch series is somehow causing the
file system to be more corrupted than it otherwise would be.  I'm not
sure.

However, the ext4/ext3 hang *is* a real hang in the kernel space, and
generic/475 is not completing because the kernel seems to have ended
up deadlocking somehow.  With just the first patch in this patch
series ("jbd2: recheck chechpointing non-dirty buffer") we're getting
a kernel NULL pointer derefence:

[  310.447568] EXT4-fs error (device dm-7): ext4_check_bdev_write_error:223: comm fsstress: Error while async write back metadata
[  310.458038] EXT4-fs error (device dm-7): __ext4_get_inode_loc_noinmem:4467: inode #99400: block 393286: comm fsstress: unable to read itable block
[  310.458421] JBD2: IO error reading journal superblock
[  310.484755] EXT4-fs warning (device dm-7): ext4_end_bio:343: I/O error 10 writing to inode 36066 starting block 19083)
[  310.490956] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  310.490959] #PF: supervisor write access in kernel mode
[  310.490961] #PF: error_code(0x0002) - not-present page
[  310.490963] PGD 0 P4D 0 
[  310.490966] Oops: 0002 [#1] PREEMPT SMP PTI
[  310.490970] CPU: 1 PID: 15600 Comm: fsstress Not tainted 6.4.0-rc5-xfstests-00055-gd3ab1bca26b4 #190
[  310.490974] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023
[  310.490976] RIP: 0010:jbd2_journal_set_features+0x13d/0x430
[  310.490985] Code: 0f 94 c0 44 20 e8 0f 85 e0 00 00 00 be 94 01 00 00 48 c7 c7 a1 33 59 b4 48 89 0c 24 4c 8b 7d 38 e8 a8 dc c5 ff 2e 2e 2e 31 c0 <f0> 49 0f ba 2f 02 48 8b 0c 24 0f 82 24 02 00 00 45 84 ed 8b 41 28
[  310.490988] RSP: 0018:ffffb9b441043b30 EFLAGS: 00010246
[  310.490990] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8edb447b8100
[  310.490993] RDX: 0000000000000000 RSI: 0000000000000194 RDI: ffffffffb45933a1
[  310.490994] RBP: ffff8edb45a62800 R08: ffffffffb460d6c0 R09: 0000000000000000
[  310.490996] R10: 204f49203a324442 R11: 4f49203a3244424a R12: 0000000000000000
[  310.490997] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
[  310.490999] FS:  00007f2940cca740(0000) GS:ffff8edc19500000(0000) knlGS:0000000000000000
[  310.491005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  310.491007] CR2: 0000000000000000 CR3: 000000012543e003 CR4: 00000000003706e0
[  310.491009] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  310.491011] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  310.491012] Call Trace:
[  310.491016]  <TASK>
[  310.491019]  ? __die+0x23/0x60
[  310.491025]  ? page_fault_oops+0xa4/0x170
[  310.491029]  ? exc_page_fault+0x67/0x170
[  310.491032]  ? asm_exc_page_fault+0x26/0x30
[  310.491039]  ? jbd2_journal_set_features+0x13d/0x430
[  310.491043]  jbd2_journal_revoke+0x47/0x1e0
[  310.491046]  __ext4_forget+0xc3/0x1b0
[  310.491051]  ext4_free_blocks+0x214/0x2f0
[  310.491056]  ext4_free_branches+0xeb/0x270
[  310.491061]  ext4_ind_truncate+0x2bf/0x320
[  310.491065]  ext4_truncate+0x1e4/0x490
[  310.491069]  ext4_handle_inode_extension+0x1bd/0x2a0
[  310.491073]  ? iomap_dio_complete+0xaf/0x1d0
[  310.511141] ------------[ cut here ]------------
[  310.516121]  ext4_dio_write_iter+0x346/0x3e0
[  310.516132]  ? __handle_mm_fault+0x171/0x200
[  310.516135]  vfs_write+0x21a/0x3e0
[  310.516140]  ksys_write+0x6f/0xf0
[  310.516142]  do_syscall_64+0x3b/0x90
[  310.516147]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  310.516154] RIP: 0033:0x7f2940eb2fb3
[  310.516158] Code: 75 05 48 83 c4 58 c3 e8 cb 41 ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
[  310.516161] RSP: 002b:00007ffe9a322cf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  310.516165] RAX: ffffffffffffffda RBX: 0000000000003000 RCX: 00007f2940eb2fb3
[  310.516167] RDX: 0000000000003000 RSI: 0000556ba1e31000 RDI: 0000000000000003
[  310.516168] RBP: 0000000000000003 R08: 0000556ba1e31000 R09: 00007f2940e9bbe0
[  310.516170] R10: 0000556b9fedbf59 R11: 0000000000000246 R12: 0000000000000024
[  310.516172] R13: 00000000000cf000 R14: 0000556ba1e31000 R15: 0000000000000000
[  310.516174]  </TASK>
[  310.516178] CR2: 0000000000000000
[  310.516181] ---[ end trace 0000000000000000 ]---

This is then causing fsstress to wedge:

# ps -ax -o pid,user,wchan:20,args --sort pid
    PID USER     WCHAN                COMMAND
	...
  12860 root     do_wait              /bin/bash /root/xfstests/tests/generic/475
  13086 root     rescuer_thread       [kdmflush/253:7]
  15593 root     rescuer_thread       [ext4-rsv-conver]
  15598 root     jbd2_log_wait_commit ./ltp/fsstress -d /xt-vdc -n 999999 -p 4
  15600 root     ext4_release_file    [fsstress]
  15601 root     exit_aio             [fsstress]

So at this point, I'm going to drop this entire patch series from the
dev tree, since this *does* seem to be some kind of regression
triggered by the first patch in the patch series.

Regards,

					- Ted