On 9/20/23 15:52, Jason Gunthorpe wrote: > On Wed, Sep 20, 2023 at 02:54:42PM -0500, Bob Pearson wrote: >> Jason, >> >> I am trying to figure out what caused a big drop in performance in the rxe driver between >> v6.5-rc5 and v6.5-rc6. The maximum performance for 'ib_send_bw -F -a' in local loopback mode >> dropped from about 1.9GB/sec to 1.1GB/sec between these two tags. I have also measured the performance >> of a 6.5 kernel with the 6.4 rxe driver and 6.4 infiniband/core drivers and that also shows the lower >> performance so it is not something in the rdma subsystem. (In fact there were no changes in the rxe >> driver from 6.5-rc5 to 6.5-rc6.) >> >> If I type 'git log --oneline v6.5-rc6 ^v6.5-rc5' I get about 360 lines but many of them are merge sets >> that can contain many patches. Is there a way to list all the patches contained between these two >> tags? > > I recommend you just do a git bisection, it will be more robust and > 360 patches will not take many steps > > Jason Thanks, I narrowed it down to the mitigation for the AMD/Inception vuln. that got added in v6.5-rc6. It's a huge performance hit. I think there is a way to turn it off. commit fb3bd914b3ec28f5fb697ac55c4846ac2d542855 Author: Borislav Petkov (AMD) <bp@xxxxxxxxx> Date: Wed Jun 28 11:02:39 2023 +0200 x86/srso: Add a Speculative RAS Overflow mitigation Add a mitigation for the speculative return address stack overflow vulnerability found on AMD processors. The mitigation works by ensuring all RET instructions speculate to a controlled location, similar to how speculation is controlled in the retpoline sequence. To accomplish this, the __x86_return_thunk forces the CPU to mispredict every function return using a 'safe return' sequence. To ensure the safety of this mitigation, the kernel must ensure that the safe return sequence is itself free from attacker interference. In Zen3 and Zen4, this is accomplished by creating a BTB alias between the untraining function srso_untrain_ret_alias() and the safe return function srso_safe_ret_alias() which results in evicting a potentially poisoned BTB entry and using that safe one for all function returns. In older Zen1 and Zen2, this is accomplished using a reinterpretation technique similar to Retbleed one: srso_untrain_ret() and srso_safe_ret(). Signed-off-by: Borislav Petkov (AMD) <bp@xxxxxxxxx> Apparently it requires a kernel fix for zen 1/2 but can be fixed with updated microcode for zen 3/4. Since I am doing dev on a zen 2 (3900X) cpu. I'll replicate the perf testing on my second system which is a zen 3 box to see if it is better. Bob