On Mon, Nov 21, 2022 at 07:28:36PM -0700, Song Liu wrote: > On Mon, Nov 21, 2022 at 1:12 PM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote: > > > > On Thu, Nov 17, 2022 at 12:23:16PM -0800, Song Liu wrote: > > > This patchset tries to address the following issues: > > > > > > Based on our experiments [5], we measured ~0.6% performance improvement > > > from bpf_prog_pack. This patchset further boosts the improvement to ~0.8%. > > > > I'd prefer we leave out arbitrary performance data, as it does not help much. > > This really bothers me. With real workload, we are talking about performance > difference of ~1%. I don't think there is any open source benchmark that can > show this level of performance difference. I *highly* doubt that. > In our case, we used A/B test with 80 hosts (40 vs. 40) and runs for > many hours to confidently show 1% performance difference. This exact > benchmark has a very good record of reporting smallish performance > regression. As per wikipedia, "A/B tests are useful for understanding user engagement and satisfaction of online features like a new feature or product". Let us disregards what is going on with user experience and consider evaluating the performance instead of what goes on behind the scenes. > For example, this commit > > commit 7af0145067bc ("x86/mm/cpa: Avoid the 4k pages check completely") > > fixes a bug that splits the page table (from 2MB to 4kB) for the WHOLE kernel > text. The bug stayed in the kernel for almost a year. None of all the available > open source benchmark had caught it before this specific benchmark. That doesn't mean enterpise level testing would not have caught it, and enteprise kernels run on ancient kernels so they would not catch up that fast. RHEL uses even more ancient kernels than SUSE so let's consider where SUSE was during this regression. The commit you mentioned the fix 7af0145067bc went upstream on v5.3-rc7~4^2, and that was in August 2019. The bug was introduced through commit 585948f4f695 ("x86/mm/cpa: Avoid the 4k pages check completely") and that was on v4.20-rc1~159^2~41 around September 2018. Around September 2018, the time the regression was committed, the most bleeding edge Enterprise Linux kernel in the industry was that on SLE15 and so v4.12 and so there is no way in hell the performance team at SUSE for instance would have even come close to evaluating code with that regression. In fact, they wouldn't come accross it in testing until SLE15-SP2 on the v5.3 kernel but by then the regression would have been fixed. Yes, 0-day does *some* performance testing, but it does not do any justice the monumental effort that goes into performance testing at Enterprise Linux distributions. The gap that leaves perhaps should be solved in the community long term however that that's a separate problem. But to suggest that there is *nothing* like what you have, is probably pretty innacurate. > We have used this benchmark to demonstrate performance benefits of many > optimizations. I don't understand why it suddenly becomes "arbitrary > performance data". It's because typically you'd want a benchmark you can reproduce something with, and some "A/B testing" reference really doesn't help future developers who are evaluating performance regressions, or who would want to provide critical feedback to you on things you may have overlooked when selling a generic performance improvement into the kernel. Luis