On Tue, Jan 25, 2022 at 10:35 AM David Rientjes <rientjes@xxxxxxxxxx> wrote: > > On Mon, 24 Jan 2022, Shakeel Butt wrote: > > > On an overcommitted system which is running multiple workloads of > > varying priorities, it is preferred to trigger an oom-killer to kill a > > low priority workload than to let the high priority workload receiving > > ENOMEMs. On our memory overcommitted systems, we are seeing a lot of > > ENOMEMs instead of oom-kills because io_uring_setup callchain is using > > __GFP_NORETRY gfp flag which avoids the oom-killer. Let's remove it and > > allow the oom-killer to kill a lower priority job. > > > > What is the size of the allocations that io_mem_alloc() is doing? > > If get_order(size) > PAGE_ALLOC_COSTLY_ORDER, then this will fail even > without the __GFP_NORETRY. To make the guarantee that workloads are not > receiving ENOMEM, it seems like we'd need to guarantee that allocations > going through io_mem_alloc() are sufficiently small. > > (And if we're really serious about it, then even something like a > BUILD_BUG_ON().) > The test case provided to me for which the user was seeing ENOMEMs was io_uring_setup() with 64 entries (nothing else). If I understand rings_size() calculations correctly then the 0 order allocation was requested in io_mem_alloc(). For order > PAGE_ALLOC_COSTLY_ORDER, maybe we can use __GFP_RETRY_MAYFAIL. It will at least do more aggressive reclaim though I think that is a separate discussion. For this issue, we are seeing ENOMEMs even for order 0 allocations.