Re: [PATCH] block: fix request.queuelist usage in flush

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/06/2024 10:44, Friedrich Weber wrote:
> On 05/06/2024 16:27, Chengming Zhou wrote:
>> On 2024/6/5 21:34, Friedrich Weber wrote:
>>> On 05/06/2024 12:54, Friedrich Weber wrote:
>>> [...]
>>>
>>> My results:
>>>
>>> Booting the Debian (virtual) machine with mainline kernel v6.10-rc2
>>> (c3f38fa61af77b49866b006939479069cd451173):
>>> works fine, no crash
>>>
>>> Booting the Debian (virtual) machine with patch "block: fix
>>> request.queuelist usage in flush" applied on top of v6.10-rc2: The
>>> Debian (virtual) machine crashes during boot with [1].
>>>
>>> Hope this helps! If I can provide anything else, just let me know.
>>
>> Thanks for your help, I still can't reproduce it myself, don't know why.
> 
> Weird -- when booting the Debian machine into mainline kernel v6.10-rc2
> with "block: fix request.queuelist usage in flush" applied on top, it
> crashes reliably for me. The machine having its root on LVM seems to be
> essential to reproduce the crash, though.
> 
> Maybe the fact that I'm running the Debian machine virtualized makes the
> crash more likely to trigger. I'll try to reproduce on bare metal to
> narrow down the reproducer and get back to you.

The crashing Debian VM (QEMU/KVM) has its root on an LVM Logical Volume.
As it turns out, whether there is an in-guest kernel crash on boot
depends on the cache mode of the disk backing the LVM Physical Volume
(Christoph also mentioned the write cache being relevant for the
original issue with software RAID [1]). Steps to reproduce:

- Install kernel v6.10-rc2 with "block: fix request.queuelist usage in
flush" applied inside the Debian VM

- Start the Debian VM with `cache=writethrough` for its disk (see [2]
for upstream QEMU 8.2.2 command line)
  => no in-guest crash on boot

- Start the VM with `cache=writeback` for its disk (see [3] for diff to
previous command line)
  => in-guest crash [4] on boot

Maybe the cache=writeback was missing to reproduce the crash on your
end? If you still cannot reproduce the crash, let me know -- I'll
provide more detailed steps how to generate the VM disk then.

I was also able to reproduce the crash on bare metal, with a Proxmox VE
8.2 installation -- if needed I can also retry with e.g. a bare-metal
Debian installation.

However, also here the write cache seems to be relevant:

- Running kernel is again mainline v6.10-rc2 with "block: fix
request.queuelist usage in flush" applied. root is on LVM (see [5]).

- If I enable write caching for the disk backing the LVM PV via `hdparm
-W1 /dev/sdb` and reboot, I see the crash in blk_mq_requeue_work on boot.

- If I disable write caching via `hdparm -W0 /dev/sdb` and reboot, I see
no crash.

Hope this helps. Let me know if I can provide any more information.

Best,

Friedrich

[1] https://lore.kernel.org/all/20240531061708.GB18075@xxxxxx/
[2]

[...]/qemu-stable-8.2/qemu-8.2.2/build/qemu-system-x86_64 \
  -accel kvm \
  -name 'kernel684deb,debug-threads=on' \
  -chardev
'socket,id=qmp,path=/var/run/qemu-server/198.qmp,server=on,wait=off' \
  -mon 'chardev=qmp,mode=control' \
  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
  -mon 'chardev=qmp-event,mode=control' \
  -pidfile /var/run/qemu-server/198.pid \
  -smbios 'type=1,uuid=49351322-c4a0-420b-a780-d445a638973a' \
  -smp '1,sockets=1,cores=1,maxcpus=1' \
  -nodefaults \
  -vnc 'unix:/var/run/qemu-server/198.vnc,password=on' \
  -cpu qemu64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt \
  -m 8192 \
  -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' \
  -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' \
  -device 'vmgenid,guid=78d4ee6d-a73a-43bd-9e3a-5395449af862' \
  -chardev
'socket,id=serial0,path=/var/run/qemu-server/198.serial0,server=on,wait=off'
\
  -device 'isa-serial,chardev=serial0' \
  -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
  -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' \
  -drive
'file=/dev/pve/vm-198-disk-0,if=none,id=drive-sata0,format=raw,aio=threads,detect-zeroes=on,cache=writethrough'
\
  -device 'ide-hd,bus=ahci0.0,drive=drive-sata0,id=sata0,bootindex=100' \
  -machine 'type=pc'

[3]
--- run-198-nocrash.sh	2024-06-06 17:19:10.725256488 +0200
+++ run-198-crash.sh	2024-06-06 17:18:39.444974266 +0200
@@ -19,6 +19,6 @@
   -device 'isa-serial,chardev=serial0' \
   -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
   -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' \
-  -drive
'file=/dev/pve/vm-198-disk-0,if=none,id=drive-sata0,format=raw,aio=threads,detect-zeroes=on,cache=writethrough'
\
+  -drive
'file=/dev/pve/vm-198-disk-0,if=none,id=drive-sata0,format=raw,aio=threads,detect-zeroes=on,cache=writeback'
\
   -device 'ide-hd,bus=ahci0.0,drive=drive-sata0,id=sata0,bootindex=100' \
   -machine 'type=pc'

[4]
[    1.911631] BUG: kernel NULL pointer dereference, address:
0000000000000000
[    1.912249] #PF: supervisor write access in kernel mode
[    1.912723] #PF: error_code(0x0002) - not-present page
[    1.913157] PGD 0 P4D 0
[    1.913405] Oops: Oops: 0002 [#1] PREEMPT SMP PTI
[    1.913787] CPU: 0 PID: 45 Comm: kworker/0:1H Tainted: G            E
     6.10.0-rc2-patch0604-6-10rc2+ #37
[    1.915161] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[    1.916098] Workqueue: kblockd blk_mq_requeue_work
[    1.916501] RIP: 0010:_raw_spin_lock+0x13/0x60
[    1.916895] Code: 31 db c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 0f 1f 44 00 00 65 ff 05 3c 42 4a 59 31 c0 ba 01 00 00
00 <3e> 0f b1 17 75 1b 31 c0 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9
[    1.918434] RSP: 0000:ffffa31fc0177d78 EFLAGS: 00010246
[    1.918880] RAX: 0000000000000000 RBX: ffff8dcf4dcd0400 RCX:
00000000ffffffe0
[    1.919393] RDX: 0000000000000001 RSI: 0000000000000001 RDI:
0000000000000000
[    1.920014] RBP: ffffa31fc0177d98 R08: 0000000000000000 R09:
0000000000000000
[    1.920615] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[    1.921216] R13: 0000000000000001 R14: ffffc31fbfc013c0 R15:
ffff8dcf4dcd3c54
[    1.921811] FS:  0000000000000000(0000) GS:ffff8dd077c00000(0000)
knlGS:0000000000000000
[    1.922471] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.922964] CR2: 0000000000000000 CR3: 000000010235a000 CR4:
00000000000006f0
[    1.923566] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[    1.924166] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[    1.924768] Call Trace:
[    1.924968]  <TASK>
[    1.925120]  ? show_regs+0x6c/0x80
[    1.925366]  ? __die+0x24/0x80
[    1.925580]  ? page_fault_oops+0x175/0x5e0
[    1.925862]  ? _raw_spin_lock+0x13/0x60
[    1.926127]  ? kernelmode_fixup_or_oops.constprop.0+0x69/0x90
[    1.926523]  ? __bad_area_nosemaphore+0x19f/0x280
[    1.926844]  ? bad_area_nosemaphore+0x16/0x30
[    1.927143]  ? do_user_addr_fault+0x2ce/0x690
[    1.927450]  ? exc_page_fault+0x82/0x1b0
[    1.927719]  ? asm_exc_page_fault+0x27/0x30
[    1.928006]  ? _raw_spin_lock+0x13/0x60
[    1.928273]  ? blk_mq_request_bypass_insert+0x20/0xe0
[    1.928617]  blk_mq_insert_request+0x120/0x1e0
[    1.928922]  blk_mq_requeue_work+0x18f/0x230
[    1.929216]  process_one_work+0x199/0x3e0
[    1.929496]  worker_thread+0x32a/0x500
[    1.929754]  ? __pfx_worker_thread+0x10/0x10
[    1.930047]  kthread+0xe4/0x110
[    1.930271]  ? __pfx_kthread+0x10/0x10
[    1.930531]  ret_from_fork+0x47/0x70
[    1.930781]  ? __pfx_kthread+0x10/0x10
[    1.931040]  ret_from_fork_asm+0x1a/0x30
[    1.931318]  </TASK>
[    1.931473] Modules linked in: efi_pstore(E) dmi_sysfs(E)
qemu_fw_cfg(E) ip_tables(E) x_tables(E) autofs4(E) psmouse(E) bochs(E)
drm_vram_helper(E) drm_ttm_helper(E) ahci(E) ttm(E) libahci(E)
i2c_piix4(E) pata_acpi(E) floppy(E)
[    1.932778] CR2: 0000000000000000
[    1.932993] ---[ end trace 0000000000000000 ]---
[    1.933289] RIP: 0010:_raw_spin_lock+0x13/0x60
[    1.933572] Code: 31 db c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 0f 1f 44 00 00 65 ff 05 3c 42 4a 59 31 c0 ba 01 00 00
00 <3e> 0f b1 17 75 1b 31 c0 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9
[    1.934720] RSP: 0000:ffffa31fc0177d78 EFLAGS: 00010246
[    1.935049] RAX: 0000000000000000 RBX: ffff8dcf4dcd0400 RCX:
00000000ffffffe0
[    1.935510] RDX: 0000000000000001 RSI: 0000000000000001 RDI:
0000000000000000
[    1.935967] RBP: ffffa31fc0177d98 R08: 0000000000000000 R09:
0000000000000000
[    1.936437] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[    1.937001] R13: 0000000000000001 R14: ffffc31fbfc013c0 R15:
ffff8dcf4dcd3c54
[    1.937481] FS:  0000000000000000(0000) GS:ffff8dd077c00000(0000)
knlGS:0000000000000000
[    1.937995] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.938599] CR2: 0000000000000000 CR3: 000000010235a000 CR4:
00000000000006f0
[    1.939320] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[    1.940067] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400

[5]
# lsblk --ascii
NAME               MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda                  8:0    1 465.8G  0 disk
sdb                  8:16   1 119.2G  0 disk
|-sdb1               8:17   1  1007K  0 part
|-sdb2               8:18   1     1G  0 part /boot/efi
`-sdb3               8:19   1 118.2G  0 part
  |-pve-swap       252:0    0     8G  0 lvm  [SWAP]
  |-pve-root       252:2    0  39.6G  0 lvm  /
  |-pve-data_tmeta 252:3    0     1G  0 lvm
  | `-pve-data     252:5    0  53.9G  0 lvm
  `-pve-data_tdata 252:4    0  53.9G  0 lvm
    `-pve-data     252:5    0  53.9G  0 lvm
sdc                  8:32   1 476.9G  0 disk
sdd                  8:48   1 446.9G  0 disk
# pvs
  PV         VG  Fmt  Attr PSize    PFree
  /dev/sdb3  pve lvm2 a--  <118.24g 14.75g
# vgs
  VG  #PV #LV #SN Attr   VSize    VFree
  pve   1   3   0 wz--n- <118.24g 14.75g
# lvs
  LV   VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log
Cpy%Sync Convert
  data pve twi-a-tz-- <53.93g             0.00   1.59
  root pve -wi-ao---- <39.56g
  swap pve -wi-ao----   8.00g
# findmnt -ascii
TARGET    SOURCE               FSTYPE OPTIONS
/         /dev/mapper/pve-root ext4   errors=remount-ro
/boot/efi UUID=6314-15A9       vfat   defaults
none      /dev/mapper/pve-swap swap   sw
/proc     proc                 proc   defaults





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux