Re: [PATCH bpf-next v1 RESEND 1/5] vmalloc: introduce vmalloc_exec, vfree_exec, and vcopy_exec

Luis Chamberlain <mcgrof@xxxxxxxxxx> · Mon, 7 Nov 2022 09:39:03 -0800

On Mon, Nov 07, 2022 at 02:40:14PM +0800, Aaron Lu wrote:
> Hello,
> 
> On Wed, Nov 02, 2022 at 04:41:59PM -0700, Luis Chamberlain wrote:
> 
> ... ...
> 
> > I'm under the impression that the real missed, undocumented, major value-add
> > here is that the old "BPF prog pack" strategy helps to reduce the direct map
> > fragmentation caused by heavy use of the eBPF JIT programs and this in
> > turn helps your overall random system performance (regardless of what
> > it is you do). As I see it then the eBPF prog pack is just one strategy to
> > try to mitigate memory fragmentation on the direct map caused by the the eBPF
> > JIT programs, so the "slow down" your team has obvserved should be due to the
> > eventual fragmentation caused on the direct map *while* eBPF programs
> > get heavily used.
> > 
> > Mike Rapoport had presented about the Direct map fragmentation problem
> > at Plumbers 2021 [0], and clearly mentioned modules / BPF / ftrace /
> > kprobes as possible sources for this. Then Xing Zhengjun's 2021 performance
> > evaluation on whether using 2M/1G pages aggressively for the kernel direct map
> > help performance [1] ends up generally recommending huge pages. The work by Xing
> > though was about using huge pages *alone*, not using a strategy such as in the
> > "bpf prog pack" to share one 2 MiB huge page for *all* small eBPF programs,
> > and that I think is the real golden nugget here.
> 
> I'm interested in how this patchset (further) improves direct map
> fragmentation so would like to evaluate it to see if my previous work to
> merge small mappings back in architecture layer[1] is still necessary.

You gotta apply it to 6.0.5 which had a large change go in for eBPF
which was not present on 6.0.

> Conclusion: I think bpf_prog_pack is very good at reducing direct map
> fragmentation and this patchset can further improve this situation on
> large machines(with huge amount of memory) or with more large bpf progs
> loaded etc.

Fantastic. Thanks for the analysis, so yet another set of metrics which
I'd hope can be applied to this patch set as this effort is generalized.

Now imagine the effort in cosnideration also of modules / ftrace / kprobes.

> Some imperfect things I can think of are(not related to this patchset):
> 1 Once a split happened, it remains happened. This may not be a big deal
> now with bpf_prog_pack and this patchset because the need to allocate a
> new order-9 page and thus cause a potential split should happen much much
> less;

Not sure I follow, are you suggesting a order-9 (512 bytes) allocation would
trigger a split of the reserved say 2 MiB huge page?

> 2 When a new order-9 page has to be allocated, there is no way to tell
> the allocator to allocate this order-9 page from an already splitted PUD
> range to avoid another PUD mapping split;
> 3 As Mike and others have mentioned, there are other users that can also
> cause direct map split.

Hence the effort to generalize.

  Luis