Re: [PATCH] PCI: Data corruption happening due to race condition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 25, 2018 at 4:45 PM, Lukas Wunner <lukas@xxxxxxxxx> wrote:
> On Mon, Jun 25, 2018 at 04:27:37PM +0530, Hari Vyas wrote:
>>       This issue is happening  with multiple times device removal and
>> rescan from sysfs. Card is not removed physically.
>>       Is_added bit is set after device attach which probe nvme driver.
>> NVMe driver starts one workqueue and that one is calling pci_set_master()
>> to set is_busmaster bit.
>>       With multiple times device removal and rescan from sysfs,  race
>> condition is observed and is_added bit is over-written to 0 from workqueue
>> started by NVMe driver.
>
> Could you add a dump_stack() to pci_bus_add_device() and pci_stop_dev()
> where the is_added bit is modified, reproduce the issue and attach the
> resulting dmesg output to a newly opened bug on bugzilla.kernel.org?
>

I have raised a Bug 200283 - PCI: Data corruption happening due to a
race condition.
Please note that is_added bit is lost in pci_bus_add_device() and
__pci_set_master()
functions. Added dump_stack() with one additional debug message to
print cpu-id in
pci_set_master() is called from a CPU3 workqueue and pci_bus_add_device() from
CPU2 sysfs call.

root@bcm958742k:~# echo 1 > /sys/bus/pci/devices/0002\:01\:00.0/remove
[   32.385389] nvme nvme0: failed to set APST feature (-19)
root@bcm958742k:~# echo 1 > /sys/bus/pci/rescan
[   38.916435] pci 0002:01:00.0: BAR 0: assigned [mem
0x500000000-0x500003fff 64bit]
[   38.924822] nvme nvme0: pci function 0002:01:00.0

[   38.929702] nvme 0002:01:00.0: pci_bus_add_device:333 cpu=2


[   38.929705] CPU: 2 PID: 2360 Comm: sh Not tainted
4.17.0-02102-gdcfa25a-dirty #106
[   38.943259] Hardware name: Stingray Combo SVK (BCM958742K) (DT)


[   38.943267] nvme 0002:01:00.0: __pci_set_master:3681 cpu=3


[   38.949366] Call trace:
[   38.949375]  dump_backtrace+0x0/0x1b8
[   38.949377]  show_stack+0x14/0x1c
[   38.964748]  dump_stack+0x90/0xb0
[   38.968168]  pci_bus_add_device+0xbc/0xe0
[   38.972303]  pci_bus_add_devices+0x44/0x90
[   38.976527]  pci_bus_add_devices+0x74/0x90
[   38.980751]  pci_rescan_bus+0x2c/0x3c
[   38.984529]  bus_rescan_store+0x7c/0xa0
[   38.988485]  bus_attr_store+0x20/0x34
[   38.992264]  sysfs_kf_write+0x40/0x50
[   38.996040]  kernfs_fop_write+0xcc/0x1cc
[   39.000087]  __vfs_write+0x40/0x154
[   39.003683]  vfs_write+0xa8/0x198
[   39.007102]  ksys_write+0x58/0xbc
[   39.010520]  sys_write+0xc/0x14
[   39.013758]  __sys_trace_return+0x0/0x4

[   39.017714] CPU: 3 PID: 50 Comm: kworker/u16:1 Not tainted
4.17.0-02102-gdcfa25a-dirty #106
[   39.026329] Hardware name: Stingray Combo SVK (BCM958742K) (DT)
[   39.026336] Workqueue: nvme-reset-wq nvme_reset_work
[   39.037561] Call trace:
[   39.037563]  dump_backtrace+0x0/0x1b8
[   39.037564]  show_stack+0x14/0x1c
[   39.037566]  dump_stack+0x90/0xb0
[   39.037569]  __pci_set_master+0xd4/0x130
[   39.043866]  pci_set_master+0x18/0x2c
[   39.043867]  nvme_reset_work+0x110/0x14a4
[   39.050702]  process_one_work+0x12c/0x29c
[   39.050704]  worker_thread+0x13c/0x410
[   39.058522]  kthread+0xfc/0x128
[   39.058524]  ret_from_fork+0x10/0x18
root@bcm958742k:~#


> Thanks,
>
> Lukas



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux