On Fri 08-03-19 09:08:57, Martynas Pumputis wrote: > It has been observed that sometimes memory allocation for BPF maps > fails when there is no obvious memory pressure in a system. > > E.g. the map (BPF_MAP_TYPE_LRU_HASH, key=38, value=56, max_elems=524288) > could not be created due to due to vmalloc unable to allocate 75497472B, > when the system's memory consumption (in MB) was the following: > > Total: 3942 Used: 837 (21.24%) Free: 138 Buffers: 239 Cached: 2727 Hmm 75MB is quite large and much larger than the slab/page allocator cann provide so this is not really a fragmentation issue. Vmalloc does respect noretry but considering that there shouldn't be a large memory pressure I wonder how NORETRY managed to fail the allocation. Do you happen to have the allocation failure report? Btw. is there any real reason to opencode and duplicate kvmalloc logic here? In other words why not simply make bpf_map_area_alloc use kvmalloc_node with GFP_KERNEL? > Considering dcda9b0471 ("mm, tree wide: replace __GFP_REPEAT by > __GFP_RETRY_MAYFAIL with more useful semantic") we can replace > __GFP_NORETRY with __GFP_RETRY_MAYFAIL, as it won't invoke OOM killer > and will try harder to fulfil allocation requests. > > The change has been tested with the workloads mentioned above and by > observing oom_kill value from /proc/vmstat. > > Signed-off-by: Martynas Pumputis <m@xxxxxxxxx> > --- > kernel/bpf/syscall.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 62f6bced3a3c..eb5cefe44af3 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -136,11 +136,11 @@ static struct bpf_map *find_and_alloc_map(union bpf_attr *attr) > > void *bpf_map_area_alloc(size_t size, int numa_node) > { > - /* We definitely need __GFP_NORETRY, so OOM killer doesn't > - * trigger under memory pressure as we really just want to > - * fail instead. > + /* We definitely need __GFP_NORETRY or __GFP_RETRY_MAYFAIL, so > + * OOM killer doesn't trigger under memory pressure as we really > + * just want to fail instead. > */ > - const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO; > + const gfp_t flags = __GFP_NOWARN | __GFP_RETRY_MAYFAIL | __GFP_ZERO; > void *area; > > if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { > -- > 2.21.0 > -- Michal Hocko SUSE Labs