Re: [PATCH v1 15/40] i386/tdx: Add property sept-ve-disable for tdx-guest object

Sean Christopherson <seanjc@xxxxxxxxxx> · Fri, 2 Sep 2022 15:26:35 +0000

On Fri, Sep 02, 2022, Gerd Hoffmann wrote:
> On Fri, Sep 02, 2022 at 02:52:25AM +0000, Sean Christopherson wrote:
> > On Fri, Sep 02, 2022, Xiaoyao Li wrote:
> > > On 8/26/2022 1:57 PM, Gerd Hoffmann wrote:
> > > >    Hi,
> > > > > For TD guest kernel, it has its own reason to turn SEPT_VE on or off. E.g.,
> > > > > linux TD guest requires SEPT_VE to be disabled to avoid #VE on syscall gap
> > > > > [1].
> > > > 
> > > > Why is that a problem for a TD guest kernel?  Installing exception
> > > > handlers is done quite early in the boot process, certainly before any
> > > > userspace code runs.  So I think we should never see a syscall without
> > > > a #VE handler being installed.  /me is confused.
> > > > 
> > > > Or do you want tell me linux has no #VE handler?
> > > 
> > > The problem is not "no #VE handler" and Linux does have #VE handler. The
> > > problem is Linux doesn't want any (or certain) exception occurrence in
> > > syscall gap, it's not specific to #VE. Frankly, I don't understand the
> > > reason clearly, it's something related to IST used in x86 Linux kernel.
> > 
> > The SYSCALL gap issue is that because SYSCALL doesn't load RSP, the first instruction
> > at the SYSCALL entry point runs with a userspaced-controlled RSP.  With TDX, a
> > malicious hypervisor can induce a #VE on the SYSCALL page and thus get the kernel
> > to run the #VE handler with a userspace stack.
> > 
> > The "fix" is to use an IST for #VE so that a kernel-controlled RSP is loaded on #VE,
> > but ISTs are terrible because they don't play nice with re-entrancy (among other
> > reasons).  The RSP used for IST-based handlers is hardcoded, and so if a #VE
> > handler triggers another #VE at any point before IRET, the second #VE will clobber
> > the stack and hose the kernel.
> > v
> > It's possible to workaround this, e.g. change the IST entry at the very beginning
> > of the handler, but it's a maintenance burden.  Since the only reason to use an IST
> > is to guard against a malicious hypervisor, Linux decided it would be just as easy
> > and more beneficial to avoid unexpected #VEs due to unaccepted private pages entirely.
> 
> Hmm, ok, but shouldn't the SEPT_VE bit *really* controlled by the guest then?
> 
> Having a hypervisor-controlled config bit to protect against a malicious
> hypervisor looks pointless to me ...

IIRC, all (most?) of the attributes are included in the attestation report, so a
guest/customer can refuse to provision secrets to the guest if the hypervisor is
misbehaving.

I'm guessing Intel made it an attribute and not a dynamic control knob to simplify
the TDX module implementation.