On Mon, Jul 11, 2022 at 02:15:07PM +0200, Michal Hocko wrote: > On Sun 10-07-22 07:32:13, Shakeel Butt wrote: > > On Sat, Jul 09, 2022 at 10:26:23PM -0700, Alexei Starovoitov wrote: > > > On Fri, Jul 8, 2022 at 2:55 PM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: > > [...] > > > > > > > > Most probably Michal's comment was on free objects sitting in the caches > > > > (also pointed out by Yosry). Should we drain them on memory pressure / > > > > OOM or should we ignore them as the amount of memory is not significant? > > > > > > Are you suggesting to design a shrinker for 0.01% of the memory > > > consumed by bpf? > > > > No, just claim that the memory sitting on such caches is insignificant. > > yes, that is not really clear from the patch description. Earlier you > have said that the memory consumed might go into GBs. If that is a > memory that is actively used and not really reclaimable then bad luck. > There are other users like that in the kernel and this is not a new > problem. I think it would really help to add a counter to describe both > the overall memory claimed by the bpf allocator and actively used > portion of it. If you use our standard vmstat infrastructure then we can > easily show that information in the OOM report. OOM report can potentially be extended with info about bpf consumed memory, but it's not clear whether it will help OOM analysis. bpftool map show prints all map data already. Some devs use bpf to inspect bpf maps for finer details in run-time. drgn scripts pull that data from crash dumps. There is no need for new counters. The idea of bpf specific counters/limits was rejected by memcg folks. > OK, thanks for the clarification. There is still one thing that is not > really clear to me. Without a proper ownership bound to any process why > is it desired/helpful to account the memory to a memcg? The first step is to have a limit. memcg provides it. > We have discussed something similar in a different email thread and I > still didn't manage to find time to put all the parts together. But if > the initiator (or however you call the process which loads the program) > exits then this might be the last process in the specific cgroup and so > it can be offlined and mostly invisible to an admin. Roman already sent reparenting fix: https://patchwork.kernel.org/project/netdevbpf/patch/20220711162827.184743-1-roman.gushchin@xxxxxxxxx/ > As you have explained there is nothing really actionable on this memory > by the OOM killer either. So does it actually buy us much to account? It will be actionable. One step at a time. In the other thread we've discussed an idea to make memcg selectable when bpf objects are created. The user might create a special memcg and use it for all things bpf. This might be the way to provide bpf specific accounting and limits.