On Wed, Aug 28, 2024 at 05:27:32PM +0300, Nikolay Borisov wrote: > > > On 28.08.24 г. 12:35 ч., Kirill A. Shutemov wrote: > > Memory access #VEs are hard for Linux to handle in contexts like the > > entry code or NMIs. But other OSes need them for functionality. > > There's a static (pre-guest-boot) way for a VMM to choose one or the > > other. But VMMs don't always know which OS they are booting, so they > > choose to deliver those #VEs so the "other" OSes will work. That, > > unfortunately has left us in the lurch and exposed to these > > hard-to-handle #VEs. > > > > The TDX module has introduced a new feature. Even if the static > > configuration is set to "send nasty #VEs", the kernel can dynamically > > request that they be disabled. Once they are disabled, access to private > > memory that is not in the Mapped state in the Secure-EPT (SEPT) will > > result in an exit to the VMM rather than injecting a #VE. > > > > Check if the feature is available and disable SEPT #VE if possible. > > > > If the TD is allowed to disable/enable SEPT #VEs, the ATTR_SEPT_VE_DISABLE > > attribute is no longer reliable. It reflects the initial state of the > > control for the TD, but it will not be updated if someone (e.g. bootloader) > > changes it before the kernel starts. Kernel must check TDCS_TD_CTLS bit to > > determine if SEPT #VEs are enabled or disabled. > > LGTM. However 2 minor suggestions which might be worth addressing. > > Reviewed-by: Nikolay Borisov <nik.borisov@xxxxxxxx> > > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > > Fixes: 373e715e31bf ("x86/tdx: Panic on bad configs that #VE on "private" memory access") > > Cc: stable@xxxxxxxxxxxxxxx > > Acked-by: Kai Huang <kai.huang@xxxxxxxxx> > > --- > > arch/x86/coco/tdx/tdx.c | 76 ++++++++++++++++++++++++------- > > arch/x86/include/asm/shared/tdx.h | 10 +++- > > 2 files changed, 69 insertions(+), 17 deletions(-) > > > > diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c > > index 08ce488b54d0..f969f4f5ebf8 100644 > > --- a/arch/x86/coco/tdx/tdx.c > > +++ b/arch/x86/coco/tdx/tdx.c > > @@ -78,7 +78,7 @@ static inline void tdcall(u64 fn, struct tdx_module_args *args) > > } > > /* Read TD-scoped metadata */ > > -static inline u64 __maybe_unused tdg_vm_rd(u64 field, u64 *value) > > +static inline u64 tdg_vm_rd(u64 field, u64 *value) > > { > > struct tdx_module_args args = { > > .rdx = field, > > @@ -193,6 +193,62 @@ static void __noreturn tdx_panic(const char *msg) > > __tdx_hypercall(&args); > > } > > +/* > > + * The kernel cannot handle #VEs when accessing normal kernel memory. Ensure > > + * that no #VE will be delivered for accesses to TD-private memory. > > + * > > + * TDX 1.0 does not allow the guest to disable SEPT #VE on its own. The VMM > > + * controls if the guest will receive such #VE with TD attribute > > + * ATTR_SEPT_VE_DISABLE. > > + * > > + * Newer TDX modules allow the guest to control if it wants to receive SEPT > > + * violation #VEs. > > + * > > + * Check if the feature is available and disable SEPT #VE if possible. > > + * > > + * If the TD is allowed to disable/enable SEPT #VEs, the ATTR_SEPT_VE_DISABLE > > + * attribute is no longer reliable. It reflects the initial state of the > > + * control for the TD, but it will not be updated if someone (e.g. bootloader) > > + * changes it before the kernel starts. Kernel must check TDCS_TD_CTLS bit to > > + * determine if SEPT #VEs are enabled or disabled. > > + */ > > +static void disable_sept_ve(u64 td_attr) > > +{ > > + const char *msg = "TD misconfiguration: SEPT #VE has to be disabled"; > > + bool debug = td_attr & ATTR_DEBUG; > > + u64 config, controls; > > + > > + /* Is this TD allowed to disable SEPT #VE */ > > + tdg_vm_rd(TDCS_CONFIG_FLAGS, &config); > > + if (!(config & TDCS_CONFIG_FLEXIBLE_PENDING_VE)) { > > Should you check for the presence of those controls in in > TDX_FEATURES0.PENDING_EPT_VIOLATION_V2 ? I.e perhaps this code can be put in > the same function that checks the presence of RBP_NO_MOD in a different > series by Kai Huang? No. TDX_FEATURES0 check is not required. This bit in TDCS_CONFIG_FLAGS cannot be anything else than FLEXIBLE_PENDING_VE and checking only this bit is enough. > > > > + /* No SEPT #VE controls for the guest: check the attribute */ > > + if (td_attr & ATTR_SEPT_VE_DISABLE) > > + return; > > nit: Given that we expect most guests to actually have this attribute set > perhaps moving this check at the top of the function will cause it exit > early more often than not? The attribute is not reliable source if flexible VE controls are present as I mentioned in the commit message. We can only rely on it if there's no TDCS_CONFIG_FLEXIBLE_PENDING_VE. > > + > > + /* Relax SEPT_VE_DISABLE check for debug TD for backtraces */ > > + if (debug) > > + pr_warn("%s\n", msg); > > + else > > + tdx_panic(msg); > > + return; > > + } > > + > > + /* Check if SEPT #VE has been disabled before us */ > > + tdg_vm_rd(TDCS_TD_CTLS, &controls); > > + if (controls & TD_CTLS_PENDING_VE_DISABLE) > > + return; > > + > > + /* Keep #VEs enabled for splats in debugging environments */ > > + if (debug) > > + return; > > + > > + /* Disable SEPT #VEs */ > > + tdg_vm_wr(TDCS_TD_CTLS, TD_CTLS_PENDING_VE_DISABLE, > > + TD_CTLS_PENDING_VE_DISABLE); > > + > > + return; > > +} > > + > > static void tdx_setup(u64 *cc_mask) > > { > > struct tdx_module_args args = {}; > > <snip> -- Kiryl Shutsemau / Kirill A. Shutemov