Re: Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5

Keith Busch <kbusch@xxxxxxxxxx> · Fri, 3 Nov 2023 09:10:18 -0600

On Fri, Nov 03, 2023 at 04:01:19PM +0100, Marek Marczykowski-G'orecki wrote:
> On Thu, Nov 02, 2023 at 06:06:33PM +0100, Mikulas Patocka wrote:
> > On Thu, 2 Nov 2023, Marek Marczykowski-G'orecki wrote:
> > 
> > > On Thu, Nov 02, 2023 at 10:28:57AM +0100, Mikulas Patocka wrote:
> > > 
> > > > Try lowring /sys/block/nvme0n1/queue/max_sectors_kb to some small value 
> > > > (for example 64) and test if it helps.
> > > 
> > > Yes, this helps too.
> > 
> > On a plain upstream kernel with no other modifications (and with default 
> > max_sectors_kb), set the value /sys/module/nvme/parameters/sgl_threshold 
> > to "0" and test it if it deadlocks. Then, set this value to "1" and test 
> > it again.
> 
> Got deadlock wit both values.
> 
> > Revert sgl_threshold back to the default (32768). Boot the kernel with the 
> > option "iommu=panic". Reproduce the deadlock and if you get a kernel 
> > panic, send us the panic log.
> 
> This is a Xen PV, so Linux is not in charge of IOMMU here. And there is
> SWIOTLB involved (64MB of it), I'm not sure if for every DMA, but
> definitely for some.

So it's using xen_swiotlb_dma_ops, right? That doesn't implmeent
.opt_mapping_size, and I'm guessing it should be equal to
swiotlb_max_mapping_size().

> > Then, try this patch (without "iommu=panic"), reproduce the deadlock and 
> > tell us which one of the "printk" statements is triggered during the 
> > deadlock.
> 
> I'll try this next.

Placing my bet now: you'll see a DMA mapping error.