Re: Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5

Jan Kara <jack@xxxxxxx> · Mon, 30 Oct 2023 13:11:58 +0100

On Mon 30-10-23 12:49:01, Mikulas Patocka wrote:
> On Mon, 30 Oct 2023, Jan Kara wrote:
> > > >> What if we end up in "goto retry" more than once? I don't see a matching
> > > > 
> > > > It is impossible. Before we jump to the retry label, we set 
> > > > __GFP_DIRECT_RECLAIM. mempool_alloc can't ever fail if 
> > > > __GFP_DIRECT_RECLAIM is present (it will just wait until some other task 
> > > > frees some objects into the mempool).
> > > 
> > > Ah, missed that. And the traces don't show that we would be waiting for
> > > that. I'm starting to think the allocation itself is really not the issue
> > > here. Also I don't think it deprives something else of large order pages, as
> > > per the sysrq listing they still existed.
> > > 
> > > What I rather suspect is what happens next to the allocated bio such that it
> > > works well with order-0 or up to costly_order pages, but there's some
> > > problem causing a deadlock if the bio contains larger pages than that?
> > 
> > Hum, so in all the backtraces presented we see that we are waiting for page
> > writeback to complete but I don't see anything that would be preventing the
> > bios from completing. Page writeback can submit quite large bios so it kind
> > of makes sense that it trips up some odd behavior. Looking at the code
> > I can see one possible problem in crypt_alloc_buffer() but it doesn't
> > explain why reducing initial page order would help. Anyway: Are we
> > guaranteed mempool has enough pages for arbitrarily large bio that can
> > enter crypt_alloc_buffer()? I can see crypt_page_alloc() does limit the
> > number of pages in the mempool to dm_crypt_pages_per_client plus I assume
> > the percpu counter bias in cc->n_allocated_pages can limit the really
> > available number of pages even further. So if a single bio is large enough
> > to trip percpu_counter_read_positive(&cc->n_allocated_pages) >=
> > dm_crypt_pages_per_client condition in crypt_page_alloc(), we can loop
> > forever? But maybe this cannot happen for some reason...
> > 
> > If this is not it, I think we need to find out why the writeback bios are
> > not completeting. Probably I'd start with checking what is kcryptd,
> > presumably responsible for processing these bios, doing.
> > 
> > 								Honza
> 
> cc->page_pool is initialized to hold BIO_MAX_VECS pages. crypt_map will 
> restrict the bio size to BIO_MAX_VECS (see dm_accept_partial_bio being 
> called from crypt_map).
> 
> When we allocate a buffer in crypt_alloc_buffer, we try first allocation 
> without waiting, then we grab the mutex and we try allocation with 
> waiting.
> 
> The mutex should prevent a deadlock when two processes allocate 128 pages 
> concurrently and wait for each other to free some pages.
> 
> The limit to dm_crypt_pages_per_client only applies to pages allocated 
> from the kernel - when this limit is reached, we can still allocate from 
> the mempool, so it shoudn't cause deadlocks.

Ah, ok, I missed the limitation of the bio size in crypt_map(). Thanks for
explanation! So really the only advice I have now it to check what kcryptd
is doing when the system is stuck. Because we didn't see it in any of the
stacktrace dumps.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR