On Tue, 25 Jan 2022, Shakeel Butt wrote: > > > On an overcommitted system which is running multiple workloads of > > > varying priorities, it is preferred to trigger an oom-killer to kill a > > > low priority workload than to let the high priority workload receiving > > > ENOMEMs. On our memory overcommitted systems, we are seeing a lot of > > > ENOMEMs instead of oom-kills because io_uring_setup callchain is using > > > __GFP_NORETRY gfp flag which avoids the oom-killer. Let's remove it and > > > allow the oom-killer to kill a lower priority job. > > > > > > > What is the size of the allocations that io_mem_alloc() is doing? > > > > If get_order(size) > PAGE_ALLOC_COSTLY_ORDER, then this will fail even > > without the __GFP_NORETRY. To make the guarantee that workloads are not > > receiving ENOMEM, it seems like we'd need to guarantee that allocations > > going through io_mem_alloc() are sufficiently small. > > > > (And if we're really serious about it, then even something like a > > BUILD_BUG_ON().) > > > > The test case provided to me for which the user was seeing ENOMEMs was > io_uring_setup() with 64 entries (nothing else). > > If I understand rings_size() calculations correctly then the 0 order > allocation was requested in io_mem_alloc(). > > For order > PAGE_ALLOC_COSTLY_ORDER, maybe we can use > __GFP_RETRY_MAYFAIL. It will at least do more aggressive reclaim > though I think that is a separate discussion. For this issue, we are > seeing ENOMEMs even for order 0 allocations. > Ah, gotcha, thanks for the background. IIUC, io_uring_setup() can be done with anything with CAP_SYS_NICE so my only concern would be whether this could be used maliciously on a system not using memcg, but in that case we can already fork many small processes that consume all memory and oom kill everything else on the system already.