nvme-pci: NULL pointer dereference in nvme_dev_disable() on linux-next

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

our internal s390 CI pointed us to a potential racy "use after free" or similar 
issue in drivers/nvme/host/pci.c by ending one of the tests in the following 
kernel panic:

[ 1836.550881] nvme nvme0: pci function 0004:00:00.0
[ 1836.563814] nvme nvme0: Shutdown timeout set to 15 seconds
[ 1836.569587] nvme nvme0: 63/0/0 default/read/poll queues
[ 1836.577114]  nvme0n1: p1 p2
[ 1861.856726] nvme nvme0: pci function 0004:00:00.0
[ 1861.869539] nvme nvme0: failed to mark controller CONNECTING
[ 1861.869542] nvme nvme0: Removing after probe failure status: -16
[ 1861.869552] Unable to handle kernel pointer dereference in virtual kernel address space
[ 1861.869554] Failing address: 0000000000000000 TEID: 0000000000000483
[ 1861.869555] Fault in home space mode while using kernel ASCE.
[ 1861.869558] AS:0000000135c4c007 R3:00000003fffe0007 S:00000003fffe6000 P:000000000000013d 
[ 1861.869587] Oops: 0004 ilc:3 [#1] SMP 
[ 1861.869591] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables
nfnetlink mlx5_ib ib_uverbs uvdevice s390_trng ib_core vfio_ccw mdev vfio_iommu_type1 eadm_sch
 vfio sch_fq_codel configfs dm_service_time mlx5_core ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 libdes
sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 nvme sha_common nvme_core zfcp scsi_transport_fc
dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log pkey zcry
pt rng_core autofs4
[ 1861.869627] CPU: 4 PID: 2929 Comm: kworker/u800:0 Not tainted 6.1.0-rc3-next-20221104 #4
[ 1861.869630] Hardware name: IBM 3931 A01 701 (LPAR)
[ 1861.869631] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
[ 1861.869637] Krnl PSW : 0704c00180000000 0000000134f026d0 (mutex_lock+0x10/0x28)
[ 1861.869643]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[ 1861.869646] Krnl GPRS: 0000000001000000 0000000000000000 0000000000000078 00000000a5f8c200
[ 1861.869648]            000003800309601c 0000000000000004 0000000000000000 0000000088e64220
[ 1861.869650]            0000000000000078 0000000000000000 0000000000000098 0000000088e64000
[ 1861.869651]            00000000a5f8c200 0000000088e641e0 00000001349bdac2 0000038003ea7c20
[ 1861.869658] Krnl Code: 0000000134f026c0: c0040008cfb8        brcl    0,000000013501c630
[ 1861.869658]            0000000134f026c6: a7190000            lghi    %r1,0
[ 1861.869658]           #0000000134f026ca: e33003400004        lg      %r3,832
[ 1861.869658]           >0000000134f026d0: eb1320000030        csg     %r1,%r3,0(%r2)
[ 1861.869658]            0000000134f026d6: ec160006007c        cgij    %r1,0,6,0000000134f026e2
[ 1861.869658]            0000000134f026dc: 07fe                bcr     15,%r14
[ 1861.869658]            0000000134f026de: 47000700            bc      0,1792
[ 1861.869658]            0000000134f026e2: c0f4ffffffe7        brcl    15,0000000134f026b0
[ 1861.869715] Call Trace:
[ 1861.869716]  [<0000000134f026d0>] mutex_lock+0x10/0x28 
[ 1861.869719]  [<000003ff7fc381d6>] nvme_dev_disable+0x1b6/0x2b0 [nvme] 
[ 1861.869722]  [<000003ff7fc3929e>] nvme_reset_work+0x49e/0x6a0 [nvme] 
[ 1861.869724]  [<0000000134309158>] process_one_work+0x200/0x458 
[ 1861.869730]  [<00000001343098e6>] worker_thread+0x66/0x480 
[ 1861.869732]  [<0000000134312888>] kthread+0x108/0x110 
[ 1861.869735]  [<0000000134297354>] __ret_from_fork+0x3c/0x58 
[ 1861.869738]  [<0000000134f074ea>] ret_from_fork+0xa/0x40 
[ 1861.869740] Last Breaking-Event-Address:
[ 1861.869741]  [<00000001349bdabc>] blk_mq_quiesce_tagset+0x2c/0xc0
[ 1861.869747] Kernel panic - not syncing: Fatal exception: panic_on_oops

On a stock kernel from
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tag/?h=next-20221104
we have been able to reproduce this at will with
this small script 

#!/usr/bin/env bash

echo $1 > /sys/bus/pci/drivers/nvme/unbind
echo $1 > /sys/bus/pci/drivers/nvme/bind
echo 1 > /sys/bus/pci/devices/$1/remove

when filling in the NVMe drives' PCI identifier.

We believe this to be a race-condition somewhere, since this sequence does not produce the panic
when executed interactively.

Could this be linked to the recent (refactoring) work by Christoph Hellwig?
E.g. https://lore.kernel.org/all/20221101150050.3510-3-hch@xxxxxx/

Thank you,
Gerd Bayer




[Index of Archives]     [Linux Kernel]     [Linux USB Development]     [Yosemite News]     [Linux SCSI]

  Powered by Linux