On Wed, Dec 28, 2022 at 1:41 AM Eric Pilmore <epilmore@xxxxxxxxxx> wrote: > > Wondering if this might be a known issue in the ptdma DMA driver. Did > not see anything obvious in bugzilla. > > I am doing some testing of the ntb_netdev module in conjunction with > the ptdma module as the supporting DMA engines on an AMD Rome CPU > based platform. The ptdma driver being used is the latest code in the > Linux (6.2) repository. > > There are no issues in doing simple ping operations across the > ntb_netdev (TCP/IP) interface, including sending large packets which > we know will cause the respective DMA engines to be utilized. However, > while doing iperf testing across the ntb_netdev interface, we have > encountered a panic: > > [ 1626.776583] RIP: 0010:mutex_spin_on_owner+0x3b/0xa0 > .... > [ 1626.776588] Call Trace: > [ 1626.776588] <IRQ> > [ 1626.776589] __mutex_lock.isra.7+0xad/0x4c0 > [ 1626.776589] ? ntb_transport_rx_enqueue+0x127/0x200 [ntb_transport] > [ 1626.776589] __mutex_lock_slowpath+0x13/0x20 > [ 1626.776590] ? __mutex_lock_slowpath+0x13/0x20 > [ 1626.776590] mutex_lock+0x2f/0x40 > [ 1626.776590] pt_core_perform_passthru+0xc5/0x160 [ptdma] > [ 1626.776591] pt_cmd_callback.part.7+0x262/0x2d0 [ptdma] > [ 1626.776591] pt_cmd_callback+0x13/0x20 [ptdma] > [ 1626.776591] pt_check_status_trans+0xc3/0x120 [ptdma] > [ 1626.776592] pt_core_irq_handler+0x36/0x60 [ptdma] > [ 1626.776592] __handle_irq_event_percpu+0x44/0x1a0 > [ 1626.776592] handle_irq_event_percpu+0x32/0x80 > [ 1626.776593] handle_irq_event+0x3b/0x60 > [ 1626.776593] handle_edge_irq+0x83/0x1a0 > [ 1626.776593] handle_irq+0x20/0x30 > [ 1626.776593] do_IRQ+0x50/0xe0 > [ 1626.776594] common_interrupt+0xf/0xf > > The issue is that the ptdma handlers are getting called in interrupt > context, and ultimately the flow leads to pt_core_execute_cmd() which > will attempt to grab a mutex, which is really not appropriate in > interrupt context. I have temporarily changed the lock in question to > a spinlock, which seems to have resolved the issue. However, I don't > know enough about the ptdma driver to really know if this is the > desired repair. > > Hoping that others with more knowledge in this driver might be able to > comment as to the validity of this bug and whether a spinlock is the > correct approach here. If it is, I would be happy to submit a patch, > otherwise I can just file a bugzilla for the module owner to make a > more appropriate fix. > > Thanks for any advice. > > Eric Pilmore I haven't heard any further on this, so I filed a bugzilla so it doesn't get lost. Eric