Re: [PATCH bpf-next v4 0/6] execmem_alloc for BPF programs

Luis Chamberlain <mcgrof@xxxxxxxxxx> · Tue, 22 Nov 2022 16:21:21 -0800

On Mon, Nov 21, 2022 at 07:28:36PM -0700, Song Liu wrote:
> On Mon, Nov 21, 2022 at 1:12 PM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote:
> >
> > On Thu, Nov 17, 2022 at 12:23:16PM -0800, Song Liu wrote:
> > > This patchset tries to address the following issues:
> > >
> > > Based on our experiments [5], we measured ~0.6% performance improvement
> > > from bpf_prog_pack. This patchset further boosts the improvement to ~0.8%.
> >
> > I'd prefer we leave out arbitrary performance data, as it does not help much.
> 
> This really bothers me. With real workload, we are talking about performance
> difference of ~1%. I don't think there is any open source benchmark that can
> show this level of performance difference.

I *highly* doubt that.

> In our case, we used A/B test with 80 hosts (40 vs. 40) and runs for
> many hours to confidently show 1% performance difference. This exact
> benchmark has a very good record of reporting smallish performance
> regression.

As per wikipedia, "A/B tests are useful for understanding user
engagement and satisfaction of online features like a new feature or
product". Let us disregards what is going on with user experience and
consider evaluating the performance instead of what goes on behind the
scenes.

> For example, this commit
> 
>   commit 7af0145067bc ("x86/mm/cpa: Avoid the 4k pages check completely")
> 
> fixes a bug that splits the page table (from 2MB to 4kB) for the WHOLE kernel
> text. The bug stayed in the kernel for almost a year. None of all the available
> open source benchmark had caught it before this specific benchmark.

That doesn't mean enterpise level testing would not have caught it, and
enteprise kernels run on ancient kernels so they would not catch up that
fast. RHEL uses even more ancient kernels than SUSE so let's consider
where SUSE was during this regression. The commit you mentioned the fix
7af0145067bc went upstream on v5.3-rc7~4^2, and that was in August 2019.
The bug was introduced through commit 585948f4f695 ("x86/mm/cpa: Avoid
the 4k pages check completely") and that was on v4.20-rc1~159^2~41
around September 2018. Around September 2018, the time the regression was
committed, the most bleeding edge Enterprise Linux kernel in the industry was
that on SLE15 and so v4.12 and so there is no way in hell the performance
team at SUSE for instance would have even come close to evaluating code with
that regression. In fact, they wouldn't come accross it in testing until
SLE15-SP2 on the v5.3 kernel but by then the regression would have been fixed.

Yes, 0-day does *some* performance testing, but it does not do any
justice the monumental effort that goes into performance testing at
Enterprise Linux distributions. The gap that leaves perhaps should be
solved in the community long term however that that's a separate problem.

But to suggest that there is *nothing* like what you have, is probably
pretty innacurate.

> We have used this benchmark to demonstrate performance benefits of many
> optimizations. I don't understand why it suddenly becomes "arbitrary
> performance data".

It's because typically you'd want a benchmark you can reproduce something with,
and some "A/B testing" reference really doesn't help future developers who are
evaluating performance regressions, or who would want to provide critical
feedback to you on things you may have overlooked when selling a generic
performance improvement into the kernel.

  Luis