Re: [PATCH bpf-next 1/4] bpf: Make non-preallocated allocation low priority

Yafang Shao <laoar.shao@xxxxxxxxx> · Sat, 2 Jul 2022 10:30:41 +0800

On Fri, Jul 1, 2022 at 5:47 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:
>
> Hi Yafang,
>
> On 6/29/22 5:48 PM, Yafang Shao wrote:
> > GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
> > if we allocate too much GFP_ATOMIC memory. For example, when we set the
> > memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
> > easily break the memcg limit by force charge. So it is very dangerous to
> > use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
> > remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
> > __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
> > too much memory.
> >
> > We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
> > too memory expensive for some cases. That means removing __GFP_HIGH
> > doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
> > it-avoiding issues caused by too much memory. So let's remove it.
> >
> > __GFP_KSWAPD_RECLAIM doesn't cooperate well with memcg pressure neither
> > currently. But the memcg code can be improved to make
> > __GFP_KSWAPD_RECLAIM work well under memcg pressure.
>
> Ok, but could you also explain in commit desc why it's a specific problem
> to BPF hashtab?
>

It should be a specific problem to non-preallocated BPF maps.
BPF program has its special attribute that it can run without user
application, for example, when it is pinned.
So if the user agent exits after pinning the BPF program, and if the
BPF maps are not preallocated, the memory allocated by the maps will
be continuously charged to a memcg. Then it may happen that the memcg
will eventually be filled with kernel memory. That is not a usual case
for other memcg users. Finally the memcg can't limit it because it can
force charge.

Regarding the preallocated BPF maps, it doesn't have this problem
because all its memory is allocated at load time, and if the memory
exceeds the memcg limit, it won't be loaded.

> Afaik, there is plenty of other code using GFP_ATOMIC | __GFP_NOWARN outside
> of BPF e.g. under net/, so it's a generic memcg problem?
>

I haven't checked all other GFP_ATOMIC usage yet.
But per my understanding, the net/ code should have user applications,
and if it exceeds the memcg limit, the user applications will be
killed and then it will stop allocating new memory.

[ + Shakeel, Johannes ]

This issue can be fixed in the memcg charge path.  But the memg charge
path is fragile.
The force charge of GFP_ATOMIC was introduced in
commit 869712fd3de5 ("mm: memcontrol: fix network errors from failing
__GFP_ATOMIC charges") by checking __GFP_ATOMIC, and then got improved
in
commit 1461e8c2b6af ("memcg: unify force charging conditions") by
checking __GFP_HIGH (that is no problem because both __GFP_HIGH and
__GFP_ATOMIC are set in GFP_AOMIC).
So it seems that once we fix a problem in memcg, another problem may
be introduced by this fix. So, if we want to fix it in memcg, we have
to carefully verify all the callsites. Now that we can fix it in BPF,
we'd better not modify the memcg code.  But maybe a comment is needed
in memcg, in case someone may change the behavior of GFP_ATOMIC in the
memg charge path once more, for example,

nomem:
+   /* Pls, don't force charge __GFP_ATOMIC allocation if
+    * __GFP_HIGH or __GFP_NOFAIL is not set with it.
+    */
    if (!(gfp_mask & (__GFP_NOFAIL | __GFP_HIGH)))
        return -ENOMEM;
force:

> Why are lpm trie and local storage map for BPF not affected (at least I don't
> see them covered in the patch)?
>

I haven't verified lpm trie and local storage map yet.
I will verify them.

-- 
Regards
Yafang