On 21/08/18 05:28 PM, Eric Pilmore wrote: > > > On Tue, Aug 21, 2018 at 4:20 PM, Logan Gunthorpe <logang@xxxxxxxxxxxx > <mailto:logang@xxxxxxxxxxxx>> wrote: > > > > On 21/08/18 05:18 PM, Eric Pilmore wrote: > > We have been running locally with Kit's change for dma_map_resource and its > > incorporation in ntb_async_tx_submit for the destination address and > > it runs fine > > under "load" (iperf) on a Xeon (Xeon(R) CPU E5-2680 v4 @ 2.40GHz) based system, > > regardless of whether the DMA engine being used is IOAT or a PLX > > device sitting in > > the PCIe tree. However, when we go back to a i7 (i7-7700K CPU @ 4.20GHz) based > > system it seems to run into issues, specifically when put under a > > load. In this case, > > just having a load using a single ping command with an interval=0, i.e. no delay > > between ping packets, after a few thousand packets the system just hangs. No > > panic or watchdogs. Note that in this scenario I can only use a PLX DMA engine. > > This is just my best guess: but it sounds to me like a bug in the PLX > DMA driver or hardware. > > > The PLX DMA driver? But the PLX driver isn't really even involved in > the mapping > stage. Are you thinking maybe the stage at which the DMA descriptor is > freed and > the PLX DMA driver does a dma_descriptor_unmap? Hmm, well what would make you think the hang is during mapping/unmapping? I would expect a hang to be in handling completions from the DMA engine or something like that. > Again, PLX did not exhibit any issues on the Xeon system. Oh, I missed that. That puts a crinkle in my theory but, as you say, it could be a timing issue. Also, it's VERY strange that it would hang the entire system. That makes things very hard to debug... Logan