On Mon, 24 Jan 2022, Shakeel Butt wrote: > On an overcommitted system which is running multiple workloads of > varying priorities, it is preferred to trigger an oom-killer to kill a > low priority workload than to let the high priority workload receiving > ENOMEMs. On our memory overcommitted systems, we are seeing a lot of > ENOMEMs instead of oom-kills because io_uring_setup callchain is using > __GFP_NORETRY gfp flag which avoids the oom-killer. Let's remove it and > allow the oom-killer to kill a lower priority job. > What is the size of the allocations that io_mem_alloc() is doing? If get_order(size) > PAGE_ALLOC_COSTLY_ORDER, then this will fail even without the __GFP_NORETRY. To make the guarantee that workloads are not receiving ENOMEM, it seems like we'd need to guarantee that allocations going through io_mem_alloc() are sufficiently small. (And if we're really serious about it, then even something like a BUILD_BUG_ON().) > Signed-off-by: Shakeel Butt <shakeelb@xxxxxxxxxx> > --- > fs/io_uring.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index e54c4127422e..d9eeb202363c 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -8928,10 +8928,9 @@ static void io_mem_free(void *ptr) > > static void *io_mem_alloc(size_t size) > { > - gfp_t gfp_flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP | > - __GFP_NORETRY | __GFP_ACCOUNT; > + gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP; > > - return (void *) __get_free_pages(gfp_flags, get_order(size)); > + return (void *) __get_free_pages(gfp, get_order(size)); > } > > static unsigned long rings_size(unsigned sq_entries, unsigned cq_entries, > -- > 2.35.0.rc0.227.g00780c9af4-goog > > >