Re: [PATCH bpf-next 0/5] bpf: BPF specific memory allocator.

Michal Hocko <mhocko@xxxxxxxx> · Fri, 8 Jul 2022 15:41:47 +0200

On Wed 06-07-22 11:05:25, Alexei Starovoitov wrote:
> On Wed, Jul 06, 2022 at 06:55:36PM +0100, Matthew Wilcox wrote:
[...]
> > For example, I assume that a BPF program
> > has a fairly tight limit on how much memory it can cause to be allocated.
> > Right?
> 
> No. It's constrained by memcg limits only. It can allocate gigabytes.

I have very briefly had a look at the core allocator parts (please note
that my understanding of BPF is really close to zero so I might be
missing a lot of implicit stuff). So by constrained by memcg you mean
__GFP_ACCOUNT done from the allocation context (irq_work). The complete
gfp mask is GFP_ATOMIC | __GFP_NOMEMALLOC | __GFP_NOWARN | __GFP_ACCOUNT
which means this allocation is not allowed to sleep and GFP_ATOMIC
implies __GFP_HIGH to say that access to memory reserves is allowed.
Memcg charging code interprets this that the hard limit can be breached
under assumption that these are rare and will be compensated in some
way. The bulk allocator implemented here, however, doesn't reflect that
and continues allocating as it sees a success so the breach of the limit
is only bound by the number of objects to be allocated. If those can be
really large then this is a clear problem and __GFP_HIGH usage is not
really appropriate.

Also, I do not see any tracking of the overall memory sitting in these
pools and I think this would be really appropriate. As there doesn't
seem to be any reclaim mechanism implemented this can hide quite some
unreachable memory.

Finally it is not really clear to what kind of entity is the life time
of these caches bound to. Let's say the system goes OOM, is any process
responsible for it and a clean up would be done if it gets killed?

Thanks!
-- 
Michal Hocko
SUSE Labs