Re: [PATCH bpf-next v1 RESEND 1/5] vmalloc: introduce vmalloc_exec, vfree_exec, and vcopy_exec

Luis Chamberlain <mcgrof@xxxxxxxxxx> · Thu, 3 Nov 2022 20:29:21 -0700

On Thu, Nov 03, 2022 at 05:18:51PM -0700, Luis Chamberlain wrote:
> On Thu, Nov 03, 2022 at 09:19:25PM +0000, Edgecombe, Rick P wrote:
> > On Thu, 2022-11-03 at 11:59 -0700, Luis Chamberlain wrote:
> > > > > Mike Rapoport had presented about the Direct map fragmentation
> > > > > problem
> > > > > at Plumbers 2021 [0], and clearly mentioned modules / BPF /
> > > > > ftrace /
> > > > > kprobes as possible sources for this. Then Xing Zhengjun's 2021
> > > > > performance
> > > > > evaluation on whether using 2M/1G pages aggressively for the
> > > > > kernel direct map
> > > > > help performance [1] ends up generally recommending huge pages.
> > > > > The work by Xing
> > > > > though was about using huge pages *alone*, not using a strategy
> > > > > such as in the
> > > > > "bpf prog pack" to share one 2 MiB huge page for *all* small eBPF
> > > > > programs,
> > > > > and that I think is the real golden nugget here.
> > > > > 
> > > > > I contend therefore that the theoretical reduction of iTLB misses
> > > > > by using
> > > > > huge pages for "bpf prog pack" is not what gets your systems to
> > > > > perform
> > > > > somehow better. It should be simply that it reduces fragmentation
> > > > > and
> > > > > *this* generally can help with performance long term. If this is
> > > > > accurate
> > > > > then let's please separate the two aspects to this.
> > > > 
> > > > The direct map fragmentation is the reason for higher TLB miss
> > > > rate, both
> > > > for iTLB and dTLB.
> > > 
> > > OK so then whatever benchmark is running in tandem as eBPF JIT is
> > > hammered
> > > should *also* be measured with perf for iTLB and dTLB. ie, the patch
> > > can
> > > provide such results as a justifications.
> > 
> > Song had done some tests on the old prog pack version that to me seemed
> > to indicate most (or possibly all) of the benefit was direct map
> > fragmentation reduction.
> 
> Matches my observations but I also provided quite a bit of hints as to
> *why* I think that is. I suggested lib/test_kmod.c as an example beefy
> multithreaded selftests which really kicks the hell out of the kernel
> with whatever crap you want to run. That is precicely how I uncovered
> some odd kmod bug lingering for years.

*and*, *perhaps*... it may be that you need another memory intensive benchmark
to run in tandem, one which mimics the behaviour of the internal "shadow
production benchmark", whatever that is.

  Luis