On 15/09/2023 12:00 am, Thomas Gleixner wrote: >> and despite what Juergen said, I'm going to recommend that you do wire >> this through the paravirt infrastructure, for the benefit of regular >> users having a nice API, not because XenPV is expecting to do something >> wildly different here. > I fundamentaly hate adding this to the PV infrastructure. We don't want > more PV ops, quite the contrary. What I meant was "there should be the two top-level APIs, and under the covers they DTRT". Part of doing the right thing is to wire up paravirt for configs where that is specified. Anything else is going to force people to write logic of the form: if (WRMSRNS && !XENPV) wrmsr_ns(...) else wrmsr(...) which is going to be worse overall. And there really is one example of this antipattern already in the series. > For the initial use case at hand, there is an explicit FRED dependency > and the code in question really wants to use WRMSRNS directly and not > through a PV function call. > > I agree with your reasoning for the more generic use case where we can > gain performance independent of FRED by using WRMSRNS for cases where > the write has no serialization requirements. > > But this made me look into PV ops some more. For actual performance > relevant code the current PV ops mechanics are a horrorshow when the op > defaults to the native instruction. > > Let's look at wrmsrl(): > > wrmsrl(msr, val > wrmsr(msr, (u32)val, (u32)val >> 32)) > paravirt_write_msr(msr, low, high) > PVOP_VCALL3(cpu.write_msr, msr, low, high) > > Which results in > > mov $msr, %edi > mov $val, %rdx > mov %edx, %esi > shr $0x20, %rdx > call native_write_msr > > and native_write_msr() does at minimum: > > mov %edi,%ecx > mov %esi,%eax > wrmsr > ret Yeah, this is daft. But it can also be fixed irrespective of WRMSRNS. WRMSR has one complexity that most other PV-ops don't, and that's the exception table reference for the instruction itself. In a theoretical future ought to look like: mov $msr, %ecx mov $lo, %eax mov $hi, %edx 1: {call paravirt_blah(%rip) | cs...cs wrmsr | cs...cs wrmsrns } _ASM_EXTABLE(1b, ...) In paravirt builds, the CALL needs to be the emitted form, because it needs to function in very early boot. But once the paravirt-ness has been chosen and alternatives run, the as-native paths are fully inline. The alternative which processes this site wants to conclude that, in the case it does not alter from the CALL, to clobber the EXTABLE reference. CALL instructions can #GP, and you don't want to end up thinking you're handling a WRMSR #GP when in fact it was a non-canonical function pointer. > In the worst case 'ret' is going through the return thunk. Not to talk > about function prologues and whatever. > > This becomes even more silly for trivial instructions like STI/CLI or in > the worst case paravirt_nop(). STI/CLI are already magic. Are they not inlined? > The call makes only sense, when the native default is an actual > function, but for the trivial cases it's a blatant engineering > trainwreck. > > I wouldn't care at all if CONFIG_PARAVIRT_XXL would be the esoteric use > case, but AFAICT it's default enabled on all major distros. > > So no. I'm fundamentally disagreeing with your recommendation. The way > forward is: > > 1) Provide the native variant for wrmsrns(), i.e. rename the proposed > wrmsrns() to native_wrmsr_ns() and have the X86_FEATURE_WRMSRNS > safety net as you pointed out. > > That function can be used in code which is guaranteed to be not > affected by the PV_XXL madness. > > 2) Come up with a sensible solution for the PV_XXL horrorshow > > 3) Implement a sane general variant of wrmsr_ns() which handles > both X86_FEATURE_WRMSRNS and X86_MISFEATURE_PV_XXL > > 4) Convert other code which benefits from the non-serializing variant > to wrmsr_ns() Well - point 1 is mostly work that needs reverting as part of completing point 3, and point 2 clearly needs doing irrespective of anything else. Thanks, ~Andrew