Re: Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5

Mikulas Patocka <mpatocka@xxxxxxxxxx> · Mon, 30 Oct 2023 12:49:01 +0100 (CET)

On Mon, 30 Oct 2023, Jan Kara wrote:

> > >> What if we end up in "goto retry" more than once? I don't see a matching
> > > 
> > > It is impossible. Before we jump to the retry label, we set 
> > > __GFP_DIRECT_RECLAIM. mempool_alloc can't ever fail if 
> > > __GFP_DIRECT_RECLAIM is present (it will just wait until some other task 
> > > frees some objects into the mempool).
> > 
> > Ah, missed that. And the traces don't show that we would be waiting for
> > that. I'm starting to think the allocation itself is really not the issue
> > here. Also I don't think it deprives something else of large order pages, as
> > per the sysrq listing they still existed.
> > 
> > What I rather suspect is what happens next to the allocated bio such that it
> > works well with order-0 or up to costly_order pages, but there's some
> > problem causing a deadlock if the bio contains larger pages than that?
> 
> Hum, so in all the backtraces presented we see that we are waiting for page
> writeback to complete but I don't see anything that would be preventing the
> bios from completing. Page writeback can submit quite large bios so it kind
> of makes sense that it trips up some odd behavior. Looking at the code
> I can see one possible problem in crypt_alloc_buffer() but it doesn't
> explain why reducing initial page order would help. Anyway: Are we
> guaranteed mempool has enough pages for arbitrarily large bio that can
> enter crypt_alloc_buffer()? I can see crypt_page_alloc() does limit the
> number of pages in the mempool to dm_crypt_pages_per_client plus I assume
> the percpu counter bias in cc->n_allocated_pages can limit the really
> available number of pages even further. So if a single bio is large enough
> to trip percpu_counter_read_positive(&cc->n_allocated_pages) >=
> dm_crypt_pages_per_client condition in crypt_page_alloc(), we can loop
> forever? But maybe this cannot happen for some reason...
> 
> If this is not it, I think we need to find out why the writeback bios are
> not completeting. Probably I'd start with checking what is kcryptd,
> presumably responsible for processing these bios, doing.
> 
> 								Honza

cc->page_pool is initialized to hold BIO_MAX_VECS pages. crypt_map will 
restrict the bio size to BIO_MAX_VECS (see dm_accept_partial_bio being 
called from crypt_map).

When we allocate a buffer in crypt_alloc_buffer, we try first allocation 
without waiting, then we grab the mutex and we try allocation with 
waiting.

The mutex should prevent a deadlock when two processes allocate 128 pages 
concurrently and wait for each other to free some pages.

The limit to dm_crypt_pages_per_client only applies to pages allocated 
from the kernel - when this limit is reached, we can still allocate from 
the mempool, so it shoudn't cause deadlocks.

Mikulas