On Fri, Nov 03, 2023 at 04:01:19PM +0100, Marek Marczykowski-G'orecki wrote: > On Thu, Nov 02, 2023 at 06:06:33PM +0100, Mikulas Patocka wrote: > > On Thu, 2 Nov 2023, Marek Marczykowski-G'orecki wrote: > > > > > On Thu, Nov 02, 2023 at 10:28:57AM +0100, Mikulas Patocka wrote: > > > > > > > Try lowring /sys/block/nvme0n1/queue/max_sectors_kb to some small value > > > > (for example 64) and test if it helps. > > > > > > Yes, this helps too. > > > > On a plain upstream kernel with no other modifications (and with default > > max_sectors_kb), set the value /sys/module/nvme/parameters/sgl_threshold > > to "0" and test it if it deadlocks. Then, set this value to "1" and test > > it again. > > Got deadlock wit both values. > > > Revert sgl_threshold back to the default (32768). Boot the kernel with the > > option "iommu=panic". Reproduce the deadlock and if you get a kernel > > panic, send us the panic log. > > This is a Xen PV, so Linux is not in charge of IOMMU here. And there is > SWIOTLB involved (64MB of it), I'm not sure if for every DMA, but > definitely for some. So it's using xen_swiotlb_dma_ops, right? That doesn't implmeent .opt_mapping_size, and I'm guessing it should be equal to swiotlb_max_mapping_size(). > > Then, try this patch (without "iommu=panic"), reproduce the deadlock and > > tell us which one of the "printk" statements is triggered during the > > deadlock. > > I'll try this next. Placing my bet now: you'll see a DMA mapping error.