On Sun, 2024-06-30 at 14:43 +0300, Tariq Toukan wrote: > > On 21/06/2024 15:35, Samuel Dobron wrote: > > Hey all, > > > > Yeah, we do tests for ELN kernels [1] on a regular basis. Since > > ~January of this year. > > > > As already mentioned, mlx5 is the only driver affected by this regression. > > Unfortunately, I think Jesper is actually hitting 2 regressions we noticed, > > the one already mentioned by Toke, another one [0] has been reported > > in early February. > > Btw. issue mentioned by Toke has been moved to Jira, see [5]. > > > > Not sure all of you are able to see the content of [0], Jira says it's > > RH-confidental. > > So, I am not sure how much I can share without being fired :D. Anyway, > > affected kernels have been released a while ago, so anyone can find it > > on its own. > > Basically, we detected 5% regression on XDP_DROP+mlx5 (currently, we > > don't have data for any other XDP mode) in kernel-5.14 compared to > > previous builds. > > > > From tests history, I can see (most likely) the same improvement > > on 6.10rc2 (from 15Mpps to 17-18Mpps), so I'd say 20% drop has been > > (partially) fixed? > > > > For earlier 6.10. kernels we don't have data due to [3] (there is regression on > > XDP_DROP as well, but I believe it's turbo-boost issue, as I mentioned > > in issue). > > So if you want to run tests on 6.10. please see [3]. > > > > Summary XDP_DROP+mlx5@25G: > > kernel pps > > <5.14 20.5M baseline > > > =5.14 19M [0] > > <6.4 19-20M baseline for ELN kernels > > > =6.4 15M [4 and 5] (mentioned by Toke) > > + @Dragos > > That's about when we added several changes to the RX datapath. > Most relevant are: > - Fully removing the in-driver RX page-cache. > - Refactoring to support XDP multi-buffer. > > We tested XDP performance before submission, I don't recall we noticed > such a degradation. Adding Carolina to post her analysis on this. > > I'll check with Dragos as he probably has these reports. > We only noticed a 6% degradation for XDP_XDROP. https://lore.kernel.org/netdev/b6fcfa8b-c2b3-8a92-fb6e-0760d5f6f5ff@xxxxxxxxxx/T/ > > > =6.10 ??? [3] > > > =6.10rc2 17M-18M > > > > > > > It looks like this is known since March, was this ever reported to Nvidia back > > > then? :/ > > > > Not sure if that's a question for me, I was told, filling an issue in > > Bugzilla/Jira is where > > our competences end. Who is supposed to report it to them? > > > > > Given XDP is in the critical path for many in production, we should think about > > > regular performance reporting for the different vendors for each released kernel, > > > similar to here [0]. > > > > I think this might be the part of upstream kernel testing with LNST? > > Maybe Jesper > > knows more about that? Until then, I think, I can let you know about > > new regressions we catch. > > > > Thanks, > > Sam. > > > > [0] https://issues.redhat.com/browse/RHEL-24054 > > [1] https://koji.fedoraproject.org/koji/search?terms=kernel-%5Cd.*eln*&type=build&match=regexp > > [2] https://koji.fedoraproject.org/koji/buildinfo?buildID=2469107 > > [3] https://bugzilla.redhat.com/show_bug.cgi?id=2282969 > > [4] https://bugzilla.redhat.com/show_bug.cgi?id=2270408 > > [5] https://issues.redhat.com/browse/RHEL-24054 > >