On Mon 30-10-23 12:49:01, Mikulas Patocka wrote: > On Mon, 30 Oct 2023, Jan Kara wrote: > > > >> What if we end up in "goto retry" more than once? I don't see a matching > > > > > > > > It is impossible. Before we jump to the retry label, we set > > > > __GFP_DIRECT_RECLAIM. mempool_alloc can't ever fail if > > > > __GFP_DIRECT_RECLAIM is present (it will just wait until some other task > > > > frees some objects into the mempool). > > > > > > Ah, missed that. And the traces don't show that we would be waiting for > > > that. I'm starting to think the allocation itself is really not the issue > > > here. Also I don't think it deprives something else of large order pages, as > > > per the sysrq listing they still existed. > > > > > > What I rather suspect is what happens next to the allocated bio such that it > > > works well with order-0 or up to costly_order pages, but there's some > > > problem causing a deadlock if the bio contains larger pages than that? > > > > Hum, so in all the backtraces presented we see that we are waiting for page > > writeback to complete but I don't see anything that would be preventing the > > bios from completing. Page writeback can submit quite large bios so it kind > > of makes sense that it trips up some odd behavior. Looking at the code > > I can see one possible problem in crypt_alloc_buffer() but it doesn't > > explain why reducing initial page order would help. Anyway: Are we > > guaranteed mempool has enough pages for arbitrarily large bio that can > > enter crypt_alloc_buffer()? I can see crypt_page_alloc() does limit the > > number of pages in the mempool to dm_crypt_pages_per_client plus I assume > > the percpu counter bias in cc->n_allocated_pages can limit the really > > available number of pages even further. So if a single bio is large enough > > to trip percpu_counter_read_positive(&cc->n_allocated_pages) >= > > dm_crypt_pages_per_client condition in crypt_page_alloc(), we can loop > > forever? But maybe this cannot happen for some reason... > > > > If this is not it, I think we need to find out why the writeback bios are > > not completeting. Probably I'd start with checking what is kcryptd, > > presumably responsible for processing these bios, doing. > > > > Honza > > cc->page_pool is initialized to hold BIO_MAX_VECS pages. crypt_map will > restrict the bio size to BIO_MAX_VECS (see dm_accept_partial_bio being > called from crypt_map). > > When we allocate a buffer in crypt_alloc_buffer, we try first allocation > without waiting, then we grab the mutex and we try allocation with > waiting. > > The mutex should prevent a deadlock when two processes allocate 128 pages > concurrently and wait for each other to free some pages. > > The limit to dm_crypt_pages_per_client only applies to pages allocated > from the kernel - when this limit is reached, we can still allocate from > the mempool, so it shoudn't cause deadlocks. Ah, ok, I missed the limitation of the bio size in crypt_map(). Thanks for explanation! So really the only advice I have now it to check what kcryptd is doing when the system is stuck. Because we didn't see it in any of the stacktrace dumps. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR