On Fri, Jul 1, 2022 at 5:47 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: > > Hi Yafang, > > On 6/29/22 5:48 PM, Yafang Shao wrote: > > GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially > > if we allocate too much GFP_ATOMIC memory. For example, when we set the > > memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can > > easily break the memcg limit by force charge. So it is very dangerous to > > use GFP_ATOMIC in non-preallocated case. One way to make it safe is to > > remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC | > > __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate > > too much memory. > > > > We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is > > too memory expensive for some cases. That means removing __GFP_HIGH > > doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with > > it-avoiding issues caused by too much memory. So let's remove it. > > > > __GFP_KSWAPD_RECLAIM doesn't cooperate well with memcg pressure neither > > currently. But the memcg code can be improved to make > > __GFP_KSWAPD_RECLAIM work well under memcg pressure. > > Ok, but could you also explain in commit desc why it's a specific problem > to BPF hashtab? > It should be a specific problem to non-preallocated BPF maps. BPF program has its special attribute that it can run without user application, for example, when it is pinned. So if the user agent exits after pinning the BPF program, and if the BPF maps are not preallocated, the memory allocated by the maps will be continuously charged to a memcg. Then it may happen that the memcg will eventually be filled with kernel memory. That is not a usual case for other memcg users. Finally the memcg can't limit it because it can force charge. Regarding the preallocated BPF maps, it doesn't have this problem because all its memory is allocated at load time, and if the memory exceeds the memcg limit, it won't be loaded. > Afaik, there is plenty of other code using GFP_ATOMIC | __GFP_NOWARN outside > of BPF e.g. under net/, so it's a generic memcg problem? > I haven't checked all other GFP_ATOMIC usage yet. But per my understanding, the net/ code should have user applications, and if it exceeds the memcg limit, the user applications will be killed and then it will stop allocating new memory. [ + Shakeel, Johannes ] This issue can be fixed in the memcg charge path. But the memg charge path is fragile. The force charge of GFP_ATOMIC was introduced in commit 869712fd3de5 ("mm: memcontrol: fix network errors from failing __GFP_ATOMIC charges") by checking __GFP_ATOMIC, and then got improved in commit 1461e8c2b6af ("memcg: unify force charging conditions") by checking __GFP_HIGH (that is no problem because both __GFP_HIGH and __GFP_ATOMIC are set in GFP_AOMIC). So it seems that once we fix a problem in memcg, another problem may be introduced by this fix. So, if we want to fix it in memcg, we have to carefully verify all the callsites. Now that we can fix it in BPF, we'd better not modify the memcg code. But maybe a comment is needed in memcg, in case someone may change the behavior of GFP_ATOMIC in the memg charge path once more, for example, nomem: + /* Pls, don't force charge __GFP_ATOMIC allocation if + * __GFP_HIGH or __GFP_NOFAIL is not set with it. + */ if (!(gfp_mask & (__GFP_NOFAIL | __GFP_HIGH))) return -ENOMEM; force: > Why are lpm trie and local storage map for BPF not affected (at least I don't > see them covered in the patch)? > I haven't verified lpm trie and local storage map yet. I will verify them. -- Regards Yafang