[AMD Official Use Only] Yes , according to the spec the AtomicOpsCtl bit is reserved in VF which means driver shouldn't touch it . The thing confused us is PF value applies to all associate VFs. we actually did some experiment try to read it in VF and want to use it to check whether the atomicOps is enabled ,we found that read the AtomicOpsCtl will always return 0 in Vf although PF already set it. We also verified the KFDTest.atomic test passed in VF when the PF enabled the atomicOps. So we think the VF already applied the setting but the valuer can't be read . I'm preparing a change that for SRIOV setup, the guest driver will not try to enable the atomicOps, it will try to get the enable information from host either through private communication channel or data exchange region between host and guest. Host driver (for the PF) will try to enable the atomicOps with the pci_enable_atomic_ops_to_root on KVM or implement the similar functionality if the host OS doesn't support this. Regards shaoyun.liu -----Original Message----- From: Kuehling, Felix <Felix.Kuehling@xxxxxxx> Sent: Wednesday, September 8, 2021 12:03 PM To: Bjorn Helgaas <helgaas@xxxxxxxxxx>; Selvin Xavier <selvin.xavier@xxxxxxxxxxxx>; Liu, Shaoyun <Shaoyun.Liu@xxxxxxx> Cc: linux-pci@xxxxxxxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; Jason Gunthorpe <jgg@xxxxxxxx>; Doug Ledford <dledford@xxxxxxxxxx>; Andrew Gospodarek <andrew.gospodarek@xxxxxxxxxxxx>; Michael Chan <michael.chan@xxxxxxxxxxxx>; Devesh Sharma <devesh.sharma@xxxxxxxxxxxx>; Cornwall, Jay <Jay.Cornwall@xxxxxxx> Subject: Re: crash observed with pci_enable_atomic_ops_to_root on VF devices. Am 2021-09-08 um 11:51 a.m. schrieb Bjorn Helgaas: > [+cc Devesh, Jay, Felix] > > On Wed, Sep 01, 2021 at 06:41:50PM +0530, Selvin Xavier wrote: >> Hi all, >> >> A recent patch merged to 5.14 in the Broadcom RDMA driver to call >> pci_enable_atomic_ops_to_root crashes the host while creating VFs. >> The crash is seen when pci_enable_atomic_ops_to_root is called with a >> VF pci device. pdev->bus->self is NULL. Is this expected for VF? > Sorry I missed this before. I think you're referring to 35f5ace5dea4 > ("RDMA/bnxt_re: Enable global atomic ops if platform supports") [1], > so I cc'd Devesh (the author). > > It *is* expected that virtual buses added for SR-IOV have > bus->self == NULL, but I don't think adding a check for that is > sufficient. > > The AtomicOp Requester Enable bit is in the Device Control 2 register, > and per PCIe r5.0, sec 9.3.5.10, it is reserved in VFs and the PF > value applies to all associated VFs. > > pci_enable_atomic_ops_to_root() does not appear to take that into > account, so I also cc'd Jay and Felix, the authors of 430a23689dea > ("PCI: Add pci_enable_atomic_ops_to_root()") [2]. > > It looks like we need to enable AtomicOps in the *PF*, not in the VF. > Maybe that means pci_enable_atomic_ops_to_root() should return failure > when called on a VF, and it should be up to the driver to call it on > the PF instead? I'm not an expert on how VFs are used, but I don't > like the idea of device B reaching out to change the configuration of > device A, especially when the change also affects devices C, D, E, ... Interesting timing. [+Shaoyun] is just working on SR-IOV problems with atomic operations these days. I think it makes sense for pci_enable_atomic_ops_to_root to fail on VFs. The guest driver either has to work without atomic ops, or it has to rely on side-band information from the host (PF) driver to know whether atomic ops are available. Regards, Felix > > Bjorn > > [1] https://git.kernel.org/linus/35f5ace5dea4 > [2] https://git.kernel.org/linus/430a23689dea > >> Here is the stack trace for your reference. >> crash> bt >> PID: 4481 TASK: ffff89c6941b0000 CPU: 53 COMMAND: "bash" >> #0 [ffff9a94817136d8] machine_kexec at ffffffffb90601a4 >> #1 [ffff9a9481713728] __crash_kexec at ffffffffb9190d5d >> #2 [ffff9a94817137f0] crash_kexec at ffffffffb9191c4d >> #3 [ffff9a9481713808] oops_end at ffffffffb9025cd6 >> #4 [ffff9a9481713828] page_fault_oops at ffffffffb906e417 >> #5 [ffff9a9481713888] exc_page_fault at ffffffffb9a0ad14 >> #6 [ffff9a94817138b0] asm_exc_page_fault at ffffffffb9c00ace >> [exception RIP: pcie_capability_read_dword+28] >> RIP: ffffffffb952fd5c RSP: ffff9a9481713960 RFLAGS: 00010246 >> RAX: 0000000000000001 RBX: ffff89c6b1096000 RCX: 0000000000000000 >> RDX: ffff9a9481713990 RSI: 0000000000000024 RDI: 0000000000000000 >> RBP: 0000000000000080 R8: 0000000000000008 R9: ffff89c64341a2f8 >> R10: 0000000000000002 R11: 0000000000000000 R12: ffff89c648bab000 >> R13: 0000000000000000 R14: 0000000000000000 R15: ffff89c648bab0c8 >> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 >> #7 [ffff9a9481713988] pci_enable_atomic_ops_to_root at >> ffffffffb95359a6 >> #8 [ffff9a94817139c0] bnxt_qplib_determine_atomics at >> ffffffffc08c1a33 [bnxt_re] >> #9 [ffff9a94817139d0] bnxt_re_dev_init at ffffffffc08ba2d1 [bnxt_re] >> #10 [ffff9a9481713a78] bnxt_re_netdev_event at ffffffffc08bab8f >> [bnxt_re] >> #11 [ffff9a9481713aa8] raw_notifier_call_chain at ffffffffb9102cbe >> #12 [ffff9a9481713ad0] register_netdevice at ffffffffb9803ff3 >> #13 [ffff9a9481713b08] register_netdev at ffffffffb980410a >> #14 [ffff9a9481713b18] bnxt_init_one at ffffffffc0349572 [bnxt_en] >> #15 [ffff9a9481713b70] local_pci_probe at ffffffffb953b92f >> #16 [ffff9a9481713ba0] pci_device_probe at ffffffffb953cf8f >> #17 [ffff9a9481713be8] really_probe at ffffffffb9659619 >> #18 [ffff9a9481713c08] __driver_probe_device at ffffffffb96598fb >> #19 [ffff9a9481713c28] driver_probe_device at ffffffffb965998f >> #20 [ffff9a9481713c48] __device_attach_driver at ffffffffb9659cd2 >> #21 [ffff9a9481713c70] bus_for_each_drv at ffffffffb9657307 >> #22 [ffff9a9481713ca8] __device_attach at ffffffffb96593e0 >> #23 [ffff9a9481713ce8] pci_bus_add_device at ffffffffb9530b7a >> #24 [ffff9a9481713d00] pci_iov_add_virtfn at ffffffffb955b1ca >> #25 [ffff9a9481713d40] sriov_enable at ffffffffb955b54b >> #26 [ffff9a9481713d90] bnxt_sriov_configure at ffffffffc034d913 >> [bnxt_en] >> #27 [ffff9a9481713dd8] sriov_numvfs_store at ffffffffb955acb4 >> #28 [ffff9a9481713e10] kernfs_fop_write_iter at ffffffffb93f09ad >> #29 [ffff9a9481713e48] new_sync_write at ffffffffb933b82c >> #30 [ffff9a9481713ed0] vfs_write at ffffffffb933db64 >> #31 [ffff9a9481713f00] ksys_write at ffffffffb933dd99 >> #32 [ffff9a9481713f38] do_syscall_64 at ffffffffb9a07897 >> #33 [ffff9a9481713f50] entry_SYSCALL_64_after_hwframe at ffffffffb9c0007c >> RIP: 00007f450602f648 RSP: 00007ffe880869e8 RFLAGS: 00000246 >> RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f450602f648 >> RDX: 0000000000000002 RSI: 0000555c566c4a60 RDI: 0000000000000001 >> RBP: 0000555c566c4a60 R8: 000000000000000a R9: 00007f45060c2580 >> R10: 000000000000000a R11: 0000000000000246 R12: 00007f45063026e0 >> R13: 0000000000000002 R14: 00007f45062fd880 R15: 0000000000000002 >> ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b >> >> Please suggest a fix for solving this issue. Is adding a NULL check >> for bus->self sounds okay? >> >> Thanks, >> Selvin