On Fri, Sep 13, 2024 at 03:51:01PM +0000, Jon Kohler wrote: > > > > On Sep 13, 2024, at 1:28 AM, Chao Gao <chao.gao@xxxxxxxxx> wrote: > > > > !-------------------------------------------------------------------| > > CAUTION: External Email > > > > |-------------------------------------------------------------------! > > > > On Thu, Sep 12, 2024 at 09:24:40AM -0700, Pawan Gupta wrote: > >> On Thu, Sep 12, 2024 at 03:44:38PM +0000, Jon Kohler wrote: > >>>> It is only worth implementing the long sequence in VMEXIT_ONLY mode if it is > >>>> significantly better than toggling the MSR. > >>> > >>> Thanks for the pointer! I hadn’t seen that second sequence. I’ll do measurements on > >>> three cases and come back with data from an SPR system. > >>> 1. as-is (wrmsr on entry and exit) > >>> 2. Short sequence (as a baseline) > >>> 3. Long sequence > >> > > Pawan, > > Thanks for the pointer to the long sequence. I've tested it along with > Listing 3 (TSX Abort sequence) using KUT tscdeadline_immed test. TSX > abort sequence performs better unless BHI mitigation is off or > host/guest spec_ctrl values match, avoiding WRMSR toggling. Having the > values match the DIS_S value is easier said than done across a fleet > that is already using eIBRS heavily. > > Test System: > - Intel Xeon Gold 6442Y, microcode 0x2b0005c0 > - Linux 6.6.34 + patches, qemu 8.2 > - KVM Unit Tests @ latest (17f6f2fd) with tscdeadline_immed + edits: > - Toggle spec ctrl before test in main() > - Use cpu type SapphireRapids-v2 > > Test string: > TESTNAME=vmexit_tscdeadline_immed TIMEOUT=90s MACHINE= ACCEL= taskset -c 26 ./x86/run x86/vmexit.flat \ > -smp 1 -cpu SapphireRapids-v2,+x2apic,+tsc-deadline -append tscdeadline_immed |grep tscdeadline > > Test Results: > 1. spectre_bhi=on, host spec_ctrl=1025, guest spec_ctrl=1: tscdeadline_immed 3878 (WRMSR toggling) > 2. spectre_bhi=on, host spec_ctrl=1025, guest spec_ctrl=1025: tscdeadline_immed 3153 (no WRMSR toggling) > 3. spectre_bhi=vmexit, BHB long sequence, host/guest spec_ctrl=1: tscdeadline_immed 3629 (still better than test 1, penalty only on exit) > 4. spectre_bhi=vmexit, TSX abort sequence, host/guest spec_ctrl=1: tscdeadline_immed 3294 (best general purpose performance) This looks promising. > 5. spectre_bhi=vmexit, TSX abort sequence, host spec_ctrl=1, guest spec_ctrl=1025: tscdeadline_immed 4011 (needs optimization) Once QEMU adds support for exposing BHI_CTRL, this is a very likely scenario. To optimize this, host needs to have BHI_DIS_S set. We also need to account for the case where some guests set BHI_DIS_S and others dont. > In short, there is a significant speedup to be had here. > > As for test 5, honest that is somewhat invalid because it would be > dependent on the VMM user space showing BHI_CTRL. Right. > QEMU as an example does not do that, so even with latest qemu and latest > kernel, guests will still use BHB loop even on SPR++ today, and they > could use the TSX loop with this proposed change if the VMM exposes RTM > feature. I did not know that QEMU does not expose CPUID.BHI_CTRL. Chao, could you please help getting this feature exposed in QEMU? > I'm happy to post a V2 patch with my TSX changes, or take any other > suggestions here. With CPUID.BHI_CTRL exposed to guests, this: > 2. spectre_bhi=on, host spec_ctrl=1025, guest spec_ctrl=1025: tscdeadline_immed 3153 (no WRMSR toggling) will be the most common case, which is also the best performing. Isn't it better to aim for this?