On Mon, 30 Oct 2023, Jan Kara wrote: > > >> What if we end up in "goto retry" more than once? I don't see a matching > > > > > > It is impossible. Before we jump to the retry label, we set > > > __GFP_DIRECT_RECLAIM. mempool_alloc can't ever fail if > > > __GFP_DIRECT_RECLAIM is present (it will just wait until some other task > > > frees some objects into the mempool). > > > > Ah, missed that. And the traces don't show that we would be waiting for > > that. I'm starting to think the allocation itself is really not the issue > > here. Also I don't think it deprives something else of large order pages, as > > per the sysrq listing they still existed. > > > > What I rather suspect is what happens next to the allocated bio such that it > > works well with order-0 or up to costly_order pages, but there's some > > problem causing a deadlock if the bio contains larger pages than that? > > Hum, so in all the backtraces presented we see that we are waiting for page > writeback to complete but I don't see anything that would be preventing the > bios from completing. Page writeback can submit quite large bios so it kind > of makes sense that it trips up some odd behavior. Looking at the code > I can see one possible problem in crypt_alloc_buffer() but it doesn't > explain why reducing initial page order would help. Anyway: Are we > guaranteed mempool has enough pages for arbitrarily large bio that can > enter crypt_alloc_buffer()? I can see crypt_page_alloc() does limit the > number of pages in the mempool to dm_crypt_pages_per_client plus I assume > the percpu counter bias in cc->n_allocated_pages can limit the really > available number of pages even further. So if a single bio is large enough > to trip percpu_counter_read_positive(&cc->n_allocated_pages) >= > dm_crypt_pages_per_client condition in crypt_page_alloc(), we can loop > forever? But maybe this cannot happen for some reason... > > If this is not it, I think we need to find out why the writeback bios are > not completeting. Probably I'd start with checking what is kcryptd, > presumably responsible for processing these bios, doing. > > Honza cc->page_pool is initialized to hold BIO_MAX_VECS pages. crypt_map will restrict the bio size to BIO_MAX_VECS (see dm_accept_partial_bio being called from crypt_map). When we allocate a buffer in crypt_alloc_buffer, we try first allocation without waiting, then we grab the mutex and we try allocation with waiting. The mutex should prevent a deadlock when two processes allocate 128 pages concurrently and wait for each other to free some pages. The limit to dm_crypt_pages_per_client only applies to pages allocated from the kernel - when this limit is reached, we can still allocate from the mempool, so it shoudn't cause deadlocks. Mikulas