On 8/26/2022 1:57 PM, Gerd Hoffmann wrote:
Hi,
For TD guest kernel, it has its own reason to turn SEPT_VE on or off. E.g.,
linux TD guest requires SEPT_VE to be disabled to avoid #VE on syscall gap
[1].
Why is that a problem for a TD guest kernel? Installing exception
handlers is done quite early in the boot process, certainly before any
userspace code runs. So I think we should never see a syscall without
a #VE handler being installed. /me is confused.
Or do you want tell me linux has no #VE handler?
The problem is not "no #VE handler" and Linux does have #VE handler. The
problem is Linux doesn't want any (or certain) exception occurrence in
syscall gap, it's not specific to #VE. Frankly, I don't understand the
reason clearly, it's something related to IST used in x86 Linux kernel.
Frankly speaking, this bit is better to be configured by TD guest
kernel, however current TDX architecture makes the design to let VMM
configure.
Indeed. Requiring users to know guest kernel capabilities and manually
configuring the vmm accordingly looks fragile to me.
Even better would be to not have that bit in the first place and require
TD guests properly handle #VE exceptions.
This can cause problems with the "system call gap": a malicious
hypervisor might trigger a #VE for example on the system call entry
code, and when a user process does a system call it would trigger a
and SYSCALL relies on the kernel code to switch to the kernel stack,
this would lead to kernel code running on the ring 3 stack.
Hmm? Exceptions switch to kernel context too ...
take care,
Gerd