On 05/06/2023 5:35 pm, Josh Poimboeuf wrote: > On Mon, Jun 05, 2023 at 02:29:02PM +0000, Jon Kohler wrote: >> Coming back to this, what if we hoisted call vmx_spec_ctrl_restore_host above >> FILL_RETURN_BUFFER, and dropped this LFENCE as I did here? >> >> That way, we wouldn’t have to mess with the internal LFENCE in nospec-branch.h, >> and that would act as the “final line of defense” LFENCE. >> >> Would that be acceptable? Or does FILL_RETURN_BUFFER *need* to occur >> before any sort of calls no matter what? > If we go by Intel's statement that only unbalanced RETs are a concern, > that *might* be ok as long as there's a nice comment above the > FILL_RETURN_BUFFER usage site describing the two purposes for the > LFENCE. > > However, based on Andy's concerns, which I've discussed with him > privately (but I'm not qualified to agree or disagree with), we may want > to just convert vmx_spec_ctrl_restore_host() to asm. Better safe than > sorry. My original implementation of that function was actually asm. I > can try to dig up that code. Lemme see if I can remember the whole safely position. I've just gone through a years worth of notes and conversations, including the following gems: >From Intel: "And then on top of that, the LFENCE in vmx_spec_ctrl_restore_host would save you. Fingers crossed." >From me: "The how and why is important. Not for right now, but for the years to come when all of this logic inevitably gets tweaked/refactored/moved". Date-check says 11 months... __vmx_vcpu_run() is a leaf function as far as the kernel stack is concerned, so to a first approximation, it behaves as such: VMEXIT RET The RET gets a prediction from the top of the RSB (a guest controlled value), but during execute the prediction is discovered to be wrong so a resteer occurs, causing it to restart from __vmx_vcpu_run()'s caller. Skipping the middle-ground with a 32-entry RSB stuffling loop, we have the following behaviour on eIBRS parts: VMEXIT (flush RSB if IBRS was 1 in guest) Restore Host MSR_SPEC_CTRL (flush RSB if IBRS was 0 in guest) RET where the RET (in theory) encounters an RSB-empty condition and falls back to an indirect prediction. PBRSB is a missing corner case in the hardware RSB flush, which is only corrected after one CALL instruction architecturally retires. The problem with mitigating however is that it is heavily entangled with BCBS, so I refer you to https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/analysis-speculative-execution-side-channels.html#inpage-nav-undefined-1-undefined which describes one example of how RETs can go wrong. The critical observation is that while for PBRSB it's the first unbalanced RET which causes problem, as soon as *any* RET has executed and got a bad resteer (even temporarily), you've lost all ability to contain the damage. So, to protect against PBRSB, one CALL instruction must retire before *any* RET instruction executes. Pawan's patch to turn the unconditional lfence into an else-lfence should be safe seeing as Intel's guidance https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html explicitly says you can use the WRMSR to restore the host MSR_SPEC_CTRL value as the speculation barrier. But, the safety of vmx_spec_ctrl_restore_host() in the first place depends on the early return never ever becoming a conditional, and the compiler never emitting a call to memcpy()/memset()/whatever behind your back - something which is not prohibited by noinstr. I hope this clears things up. ~Andrew