Re: [PATCH] bpf: Try harder when allocating memory for maps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri 08-03-19 12:30:47, Daniel Borkmann wrote:
> On 03/08/2019 11:55 AM, Michal Hocko wrote:
> > On Fri 08-03-19 11:33:00, Daniel Borkmann wrote:
> >> On 03/08/2019 09:44 AM, Michal Hocko wrote:
> >>> On Fri 08-03-19 09:08:57, Martynas Pumputis wrote:
> >>
> >> Martynas, for the patch, please also Cc netdev in the submission so
> >> that it lands properly in patchwork. Setup where patches only Cc'ed
> >> to bpf@xxxxxxxxxxxxxxx would land in our delegate is not yet completed
> >> by ozlabs folks, just fyi.
> >>
> >>>> It has been observed that sometimes memory allocation for BPF maps
> >>>> fails when there is no obvious memory pressure in a system.
> >>>>
> >>>> E.g. the map (BPF_MAP_TYPE_LRU_HASH, key=38, value=56, max_elems=524288)
> >>>> could not be created due to due to vmalloc unable to allocate 75497472B,
> >>>> when the system's memory consumption (in MB) was the following:
> >>>>
> >>>>     Total: 3942 Used: 837 (21.24%) Free: 138 Buffers: 239 Cached: 2727
> >>>
> >>> Hmm 75MB is quite large and much larger than the slab/page allocator
> >>> cann provide so this is not really a fragmentation issue. Vmalloc does
> >>
> >> Agree.
> >>
> >>> respect noretry but considering that there shouldn't be a large memory
> >>> pressure I wonder how NORETRY managed to fail the allocation. Do you
> >>> happen to have the allocation failure report?
> >>
> >> I'll defer to Martynas here.
> >>
> >>> Btw. is there any real reason to opencode and duplicate kvmalloc logic
> >>> here? In other words why not simply make bpf_map_area_alloc use
> >>> kvmalloc_node with GFP_KERNEL?
> >>
> >> Mostly historical reasons from d407bd25a204 ("bpf: don't trigger OOM killer
> >> under pressure with map alloc"). I remember back then we had a discussion
> >> that __GFP_NORETRY is not fully supported and should only be seen as a hint
> >> in our case (since it's not propagated all the way through in vmalloc, if
> >> I recall correctly).
> > 
> > Yes, that is still the case and there is no way to really have nooom
> > semantic for vmalloc. Even with your opencoded version btw.
> 
> Okay, so similar situation applies to __GFP_RETRY_MAYFAIL reclaim modifier
> then if I understand you correctly? In dcda9b0471, its mentioned "this
> means that all the reclaim opportunities have been exhausted except the
> most disruptive one (the OOM killer) and a user defined fallback behavior
> is more sensible than keep retrying in the page allocator." In the api
> comment in kvmalloc_node(), it says "__GFP_RETRY_MAYFAIL is supported,
> and it should be used only if kmalloc is preferable to the vmalloc fallback,
> due to visible performance drawbacks". But if you say above that for
> vmalloc, there is no nooom semantic, then presumably __GFP_RETRY_MAYFAIL
> should also be taken as a hint wrt OOM, not a guarantee for the given
> allocation request (if kvmalloc_node selects the vmalloc based allocator),
> is that correct?

Yes, kvmalloc* doesn't have the full MAYFAIL/NORETRY semantic because
this is not possible to implement without reworking the whole page table
allocation code or making both behave like scoped NOFS/NOIO.

But I believe you shouldn't really have to care. Does the same problem
really happen with a plain kvmalloc(GFP_KERNEL)?
-- 
Michal Hocko
SUSE Labs



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux