Hi, our internal s390 CI pointed us to a potential racy "use after free" or similar issue in drivers/nvme/host/pci.c by ending one of the tests in the following kernel panic: [ 1836.550881] nvme nvme0: pci function 0004:00:00.0 [ 1836.563814] nvme nvme0: Shutdown timeout set to 15 seconds [ 1836.569587] nvme nvme0: 63/0/0 default/read/poll queues [ 1836.577114] nvme0n1: p1 p2 [ 1861.856726] nvme nvme0: pci function 0004:00:00.0 [ 1861.869539] nvme nvme0: failed to mark controller CONNECTING [ 1861.869542] nvme nvme0: Removing after probe failure status: -16 [ 1861.869552] Unable to handle kernel pointer dereference in virtual kernel address space [ 1861.869554] Failing address: 0000000000000000 TEID: 0000000000000483 [ 1861.869555] Fault in home space mode while using kernel ASCE. [ 1861.869558] AS:0000000135c4c007 R3:00000003fffe0007 S:00000003fffe6000 P:000000000000013d [ 1861.869587] Oops: 0004 ilc:3 [#1] SMP [ 1861.869591] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink mlx5_ib ib_uverbs uvdevice s390_trng ib_core vfio_ccw mdev vfio_iommu_type1 eadm_sch vfio sch_fq_codel configfs dm_service_time mlx5_core ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 nvme sha_common nvme_core zfcp scsi_transport_fc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log pkey zcry pt rng_core autofs4 [ 1861.869627] CPU: 4 PID: 2929 Comm: kworker/u800:0 Not tainted 6.1.0-rc3-next-20221104 #4 [ 1861.869630] Hardware name: IBM 3931 A01 701 (LPAR) [ 1861.869631] Workqueue: nvme-reset-wq nvme_reset_work [nvme] [ 1861.869637] Krnl PSW : 0704c00180000000 0000000134f026d0 (mutex_lock+0x10/0x28) [ 1861.869643] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 [ 1861.869646] Krnl GPRS: 0000000001000000 0000000000000000 0000000000000078 00000000a5f8c200 [ 1861.869648] 000003800309601c 0000000000000004 0000000000000000 0000000088e64220 [ 1861.869650] 0000000000000078 0000000000000000 0000000000000098 0000000088e64000 [ 1861.869651] 00000000a5f8c200 0000000088e641e0 00000001349bdac2 0000038003ea7c20 [ 1861.869658] Krnl Code: 0000000134f026c0: c0040008cfb8 brcl 0,000000013501c630 [ 1861.869658] 0000000134f026c6: a7190000 lghi %r1,0 [ 1861.869658] #0000000134f026ca: e33003400004 lg %r3,832 [ 1861.869658] >0000000134f026d0: eb1320000030 csg %r1,%r3,0(%r2) [ 1861.869658] 0000000134f026d6: ec160006007c cgij %r1,0,6,0000000134f026e2 [ 1861.869658] 0000000134f026dc: 07fe bcr 15,%r14 [ 1861.869658] 0000000134f026de: 47000700 bc 0,1792 [ 1861.869658] 0000000134f026e2: c0f4ffffffe7 brcl 15,0000000134f026b0 [ 1861.869715] Call Trace: [ 1861.869716] [<0000000134f026d0>] mutex_lock+0x10/0x28 [ 1861.869719] [<000003ff7fc381d6>] nvme_dev_disable+0x1b6/0x2b0 [nvme] [ 1861.869722] [<000003ff7fc3929e>] nvme_reset_work+0x49e/0x6a0 [nvme] [ 1861.869724] [<0000000134309158>] process_one_work+0x200/0x458 [ 1861.869730] [<00000001343098e6>] worker_thread+0x66/0x480 [ 1861.869732] [<0000000134312888>] kthread+0x108/0x110 [ 1861.869735] [<0000000134297354>] __ret_from_fork+0x3c/0x58 [ 1861.869738] [<0000000134f074ea>] ret_from_fork+0xa/0x40 [ 1861.869740] Last Breaking-Event-Address: [ 1861.869741] [<00000001349bdabc>] blk_mq_quiesce_tagset+0x2c/0xc0 [ 1861.869747] Kernel panic - not syncing: Fatal exception: panic_on_oops On a stock kernel from https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tag/?h=next-20221104 we have been able to reproduce this at will with this small script #!/usr/bin/env bash echo $1 > /sys/bus/pci/drivers/nvme/unbind echo $1 > /sys/bus/pci/drivers/nvme/bind echo 1 > /sys/bus/pci/devices/$1/remove when filling in the NVMe drives' PCI identifier. We believe this to be a race-condition somewhere, since this sequence does not produce the panic when executed interactively. Could this be linked to the recent (refactoring) work by Christoph Hellwig? E.g. https://lore.kernel.org/all/20221101150050.3510-3-hch@xxxxxx/ Thank you, Gerd Bayer