Re: Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Thu, 2 Nov 2023, Marek Marczykowski-Górecki wrote:

> On Tue, Oct 31, 2023 at 06:24:19PM +0100, Mikulas Patocka wrote:
> 
> > > Hi
> > > 
> > > I would like to ask you to try this patch. Revert the changes to "order" 
> > > and "PAGE_ALLOC_COSTLY_ORDER" back to normal and apply this patch on a 
> > > clean upstream kernel.
> > > 
> > > Does it deadlock?
> > > 
> > > There is a bug in dm-crypt that it doesn't account large pages in 
> > > cc->n_allocated_pages, this patch fixes the bug.
> 
> This patch did not help.
> 
> > If the previous patch didn't fix it, try this patch (on a clean upstream 
> > kernel).
> >
> > This patch allocates large pages, but it breaks them up into single-page 
> > entries when adding them to the bio.
> 
> But this does help.

Thanks. So we can stop blaming the memory allocator and start blaming the 
NVMe subsystem.


I added NVMe maintainers to this thread - the summary of the problem is: 
In dm-crypt, we allocate a large compound page and add this compound page 
to the bio as a single big vector entry. Marek reports that on his system 
it causes deadlocks, the deadlocks look like a lost bio that was never 
completed. When I chop the large compound page to individual pages in 
dm-crypt and add bio vector for each of them, Marek reports that there are 
no longer any deadlocks. So, we have a problem (either hardware or 
software) that the NVMe subsystem doesn't like bio vectors with large 
bv_len. This is the original bug report: 
https://lore.kernel.org/stable/ZTNH0qtmint%2FzLJZ@mail-itl/


Marek, what NVMe devices do you use? Do you use the same device on all 3 
machines where you hit this bug?

In the directory /sys/block/nvme0n1/queue: what is the value of 
dma_alignment, max_hw_sectors_kb, max_sectors_kb, max_segment_size, 
max_segments, virt_boundary_mask?

Try lowring /sys/block/nvme0n1/queue/max_sectors_kb to some small value 
(for example 64) and test if it helps.

Mikulas

[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux