On Wed, Sep 1, 2021 at 5:20 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > On Tue, Aug 31, 2021 at 09:27:14PM +0530, Selvin Xavier wrote: > > On Fri, Aug 27, 2021 at 6:01 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > > > > > On Thu, Aug 26, 2021 at 09:15:38PM -0700, Selvin Xavier wrote: > > > > Following Host crash is observed when pci_enable_atomic_ops_to_root > > > > is called with VF PCI device. > > > > > > > > PID: 4481 TASK: ffff89c6941b0000 CPU: 53 COMMAND: "bash" > > > > #0 [ffff9a94817136d8] machine_kexec at ffffffffb90601a4 > > > > #1 [ffff9a9481713728] __crash_kexec at ffffffffb9190d5d > > > > #2 [ffff9a94817137f0] crash_kexec at ffffffffb9191c4d > > > > #3 [ffff9a9481713808] oops_end at ffffffffb9025cd6 > > > > #4 [ffff9a9481713828] page_fault_oops at ffffffffb906e417 > > > > #5 [ffff9a9481713888] exc_page_fault at ffffffffb9a0ad14 > > > > #6 [ffff9a94817138b0] asm_exc_page_fault at ffffffffb9c00ace > > > > [exception RIP: pcie_capability_read_dword+28] > > > > RIP: ffffffffb952fd5c RSP: ffff9a9481713960 RFLAGS: 00010246 > > > > RAX: 0000000000000001 RBX: ffff89c6b1096000 RCX: 0000000000000000 > > > > RDX: ffff9a9481713990 RSI: 0000000000000024 RDI: 0000000000000000 > > > > RBP: 0000000000000080 R8: 0000000000000008 R9: ffff89c64341a2f8 > > > > R10: 0000000000000002 R11: 0000000000000000 R12: ffff89c648bab000 > > > > R13: 0000000000000000 R14: 0000000000000000 R15: ffff89c648bab0c8 > > > > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > > > > #7 [ffff9a9481713988] pci_enable_atomic_ops_to_root at ffffffffb95359a6 > > > > #8 [ffff9a94817139c0] bnxt_qplib_determine_atomics at ffffffffc08c1a33 [bnxt_re] > > > > #9 [ffff9a94817139d0] bnxt_re_dev_init at ffffffffc08ba2d1 [bnxt_re] > > > > RIP: 00007f450602f648 RSP: 00007ffe880869e8 RFLAGS: 00000246 > > > > RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f450602f648 > > > > RDX: 0000000000000002 RSI: 0000555c566c4a60 RDI: 0000000000000001 > > > > RBP: 0000555c566c4a60 R8: 000000000000000a R9: 00007f45060c2580 > > > > R10: 000000000000000a R11: 0000000000000246 R12: 00007f45063026e0 > > > > R13: 0000000000000002 R14: 00007f45062fd880 R15: 0000000000000002 > > > > ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b > > > > > Apologies for the delay in my response. I was exploring internally to > > see if it is a specific issue > > with the adapter/host. I see the problem in multiple systems. > > > > > This feels like a bug in pci_enable_atomic_ops_to_root()? I assume it > > > hit a case where bus->self == NULL? > > yes. This crashes because of bus->self is NULL. Is it expected for VF? > > I'm not sure, you should ask the PCI lists > > > > Why not fix it there? > > Since its a functional breakage in 5.14, I posted a quick fix for > > 5.14. Also, we haven't done any testing on VF for this > > feature. So I wanted to avoid claiming support for VF anyway. > > > > I see that other drivers also use pci_enable_atomic_ops_to_root > > without vf/pf check. Anyone seeing this issue? > > Which is why I suspect the core code should be fixed not the driver.. Hi Jason, A patch that avoids the crash is merged to the linux-pci tree. https://lore.kernel.org/linux-pci/20210914201606.GA1452219@bjorn-Precision-5520/T/ With the pci patch, the host will not crash. But driver will get following error message when called for VF ""platform doesn't support global atomics." we want to prevent calling pci_enable_atomic_ops_to_root for VF anyway. Can you please pull this patch in bnxt_re? Thanks Selvin > > Jason
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature