Re: help with performance loss question

Bob Pearson <rpearsonhpe@xxxxxxxxx> · Wed, 20 Sep 2023 18:43:05 -0500

On 9/20/23 15:52, Jason Gunthorpe wrote:
> On Wed, Sep 20, 2023 at 02:54:42PM -0500, Bob Pearson wrote:
>> Jason,
>>
>> I am trying to figure out what caused a big drop in performance in the rxe driver between
>> v6.5-rc5 and v6.5-rc6. The maximum performance for 'ib_send_bw -F -a' in local loopback mode
>> dropped from about 1.9GB/sec to 1.1GB/sec between these two tags. I have also measured the performance
>> of a 6.5 kernel with the 6.4 rxe driver and 6.4 infiniband/core drivers and that also shows the lower
>> performance so it is not something in the rdma subsystem. (In fact there were no changes in the rxe
>> driver from 6.5-rc5 to 6.5-rc6.)
>>
>> If I type 'git log --oneline v6.5-rc6 ^v6.5-rc5' I get about 360 lines but many of them are merge sets
>> that can contain many patches. Is there a way to list all the patches contained between these two
>> tags?
> 
> I recommend you just do a git bisection, it will be more robust and
> 360 patches will not take many steps
> 
> Jason

Thanks, I narrowed it down to the mitigation for the AMD/Inception vuln. that got added in v6.5-rc6.
It's a huge performance hit. I think there is a way to turn it off.

commit fb3bd914b3ec28f5fb697ac55c4846ac2d542855
Author: Borislav Petkov (AMD) <bp@xxxxxxxxx>
Date:   Wed Jun 28 11:02:39 2023 +0200

    x86/srso: Add a Speculative RAS Overflow mitigation

    Add a mitigation for the speculative return address stack overflow
    vulnerability found on AMD processors.

    The mitigation works by ensuring all RET instructions speculate to
    a controlled location, similar to how speculation is controlled in the
    retpoline sequence.  To accomplish this, the __x86_return_thunk forces
    the CPU to mispredict every function return using a 'safe return'
    sequence.

    To ensure the safety of this mitigation, the kernel must ensure that the
    safe return sequence is itself free from attacker interference.  In Zen3
    and Zen4, this is accomplished by creating a BTB alias between the
    untraining function srso_untrain_ret_alias() and the safe return
    function srso_safe_ret_alias() which results in evicting a potentially
    poisoned BTB entry and using that safe one for all function returns.

    In older Zen1 and Zen2, this is accomplished using a reinterpretation
    technique similar to Retbleed one: srso_untrain_ret() and
    srso_safe_ret().

    Signed-off-by: Borislav Petkov (AMD) <bp@xxxxxxxxx>

Apparently it requires a kernel fix for zen 1/2 but can be fixed with updated microcode
for zen 3/4. Since I am doing dev on a zen 2 (3900X) cpu. I'll replicate the perf testing
on my second system which is a zen 3 box to see if it is better.

Bob