On 23/06/2020 12:30, Joerg Roedel wrote: > On Tue, Jun 23, 2020 at 01:07:06PM +0200, Peter Zijlstra wrote: >> On Tue, Apr 28, 2020 at 09:55:12AM +0200, Joerg Roedel wrote: >> So what happens if this #VC triggers on the first access to the #VC >> stack, because the malicious host has craftily mucked with only the #VC >> IST stack page? >> >> Or on the NMI IST stack, then we get #VC in NMI before the NMI can fix >> you up. >> >> AFAICT all of that is non-recoverable. > I am not 100% sure, but I think if the #VC stack page is not validated, > the #VC should be promoted to a #DF. > > Note that this is an issue only with secure nested paging (SNP), which > is not enabled yet with this patch-set. When it gets enabled a stack > recursion check in the #VC handler is needed which panics the VM. That > also fixes the #VC-in-early-NMI problem. There are cases which are definitely non-recoverable. For both ES and SNP, a malicious hypervisor can mess with the guest physmap to make the the NMI, #VC and #DF stacks all alias. For ES, this had better result in the #DF handler deciding that crashing is the way out, whereas for SNP, this had better escalate to Shutdown. What matters here is the security model in SNP. The hypervisor is relied upon for availability (because it could simply refuse to schedule the VM), but market/business forces will cause it to do its best to keep the VM running. Therefore, the securely model is simply(?) that the hypervisor can't do anything to undermine the confidentiality or integrity of the VM. Crashing out hard if the hypervisor is misbehaving is acceptable. In a cloud, I as a customer would (threaten to?) take my credit card elsewhere, while for enterprise, I'd shout at my virtualisation vendor until a fix happened (also perhaps threaten to take my credit card elsewhere). Therefore, it is reasonable to build on the expectation that the hypervisor won't be messing around with remapping stacks/etc. Most #VC's are synchronous with guest actions (they equate to actions which would have caused a VMExit), so you can safely reason about when the first #VC might occur, presuming no funny business with the frames backing any memory operands touched. ~Andrew