On 10/30/23 12:22, Mikulas Patocka wrote: > > > On Mon, 30 Oct 2023, Vlastimil Babka wrote: > >> Ah, missed that. And the traces don't show that we would be waiting for >> that. I'm starting to think the allocation itself is really not the issue >> here. Also I don't think it deprives something else of large order pages, as >> per the sysrq listing they still existed. >> >> What I rather suspect is what happens next to the allocated bio such that it >> works well with order-0 or up to costly_order pages, but there's some >> problem causing a deadlock if the bio contains larger pages than that? > > Yes. There are many "if (order > PAGE_ALLOC_COSTLY_ORDER)" branches in the > memory allocation code and I suppose that one of them does something bad > and triggers this bug. But I don't know which one. It's not what I meant. All the interesting branches for costly order in page allocator/compaction only apply with __GFP_DIRECT_RECLAIM, so we can't be hitting those here. The traces I've seen suggest the allocation of the bio suceeded, and problems arised only after it was submitted. I wouldn't even be surprised if the threshold for hitting the bug was not exactly order > PAGE_ALLOC_COSTLY_ORDER but order > PAGE_ALLOC_COSTLY_ORDER + 1 or + 2 (has that been tested?) or rather that there's no exact threshold, but probability increases with order. >> Cc Honza. The thread starts here: >> https://lore.kernel.org/all/ZTNH0qtmint%2FzLJZ@mail-itl/ >> >> The linked qubes reports has a number of blocked task listings that can be >> expanded: >> https://github.com/QubesOS/qubes-issues/issues/8575 > > Mikulas >