Re: [PATCH V2 1/1] x86/tdx: Route safe halt execution via tdx_safe_halt()

Vishal Annapurve <vannapurve@xxxxxxxxxx> · Tue, 4 Feb 2025 10:52:07 -0800

On Tue, Feb 4, 2025 at 9:32 AM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> I think this is the right fix for _now_. In practice, Vishal's problem
> only occurs on CONFIG_PARAVIRT_XXL systems. His proposed fix here does
> not make TDX depend on CONFIG_PARAVIRT_XXL, it just provides an extra
> override when TDX and CONFIG_PARAVIRT_XXL collide.
>
> This seems like a reasonable compromise that avoids entangling
> PARAVIRT_XXL and TDX _too_ much and also avoids reinventing a hunk of
> PARAVIRT_XXL just to fix this bug.
To ensure that we spend a bit more time here, would folks be ok with
making TDX depend on CONFIG_PARAVIRT_XXL as a stopgap until we have
the long term proposal Dave mentioned below to cleanly separate
"pv_ops.irq.safe_halt()" from paravirt infra?

>
> Long-term, I think it would be nice to move pv_ops.irq.safe_halt() away
> from being a paravirt thing and move it over to a plain static_call().
>
> Then, TDX can get rid of this hunk:
>
>                 pr_info("using TDX aware idle routine\n");
>                 static_call_update(x86_idle, tdx_safe_halt);
>
> and move back to default_idle() which could look like this:
>
>  void __cpuidle default_idle(void)
>  {
> -        raw_safe_halt();
> +        static_call(x86_safe_halt)();
>          raw_local_irq_disable();
>  }
>
> If 'x86_safe_halt' was the only route in the kernel to call 'sti;hlt'
> then we can know with pretty high confidence if TDX or Xen code sets
> their own 'x86_safe_halt' that they won't run into more bugs like this one.
>
> On to the patch itself...
>
> On 1/29/25 15:25, Vishal Annapurve wrote:
> > Direct HLT instruction execution causes #VEs for TDX VMs which is routed
> > to hypervisor via tdvmcall. This process renders HLT instruction
> > execution inatomic, so any preceding instructions like STI/MOV SS will
> > end up enabling interrupts before the HLT instruction is routed to the
> > hypervisor. This creates scenarios where interrupts could land during
> > HLT instruction emulation without aborting halt operation leading to
> > idefinite halt wait times.
>
> Vishal! I'm noticing spelling issues right up front and center here,
> just like in v1. I think I asked nicely last time if you could start
> spell checking your changelogs before posting v2. Any chance you could
> actually put some spell checking in place before v3?

Yeah, will do. I incorrectly thought that codespell runs by default
with checkpatch.pl.

>
> So, the x86 STI-shadow mechanism has left a trail of tears. We don't
> want to explain the whole sordid tale here, but I don't feel like
> talking about the "what" (atomic vs. inatomic execution) without
> explaining "why" is really sufficient to explain the problem at hand.
>
> Sean had a pretty concise description in here that I liked:
>
>         https://lore.kernel.org/all/Z5l6L3Hen9_Y3SGC@xxxxxxxxxx/
>
> But the net result is that it is currently unsafe for TDX guests to use
> the "sti;hlt" sequence. It's really important to say *that* somewhere.
>
Ack.

> > Commit bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests") already
> > upgraded x86_idle() to invoke tdvmcall to avoid such scenarios, but
> > it didn't cover pv_native_safe_halt() which can be invoked using
> > raw_safe_halt() from call sites like acpi_safe_halt().
>
> Does this convey the same thing?
>
> Commit bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests")
> prevented the idle routines from using "sti;hlt". But it missed the
> paravirt routine. That can be reached like this, for example:
>
>         acpi_safe_halt() =>
>         raw_safe_halt()  =>
>         arch_safe_halt() =>
>         irq.safe_halt()  =>
>         pv_native_safe_halt()
>
> I also dislike the "upgrade" nomenclature. It's not really an "upgrade".
Ack.
>
> ...
> > @@ -380,13 +381,18 @@ static int handle_halt(struct ve_info *ve)
> >  {
> >       const bool irq_disabled = irqs_disabled();
> >
> > +     if (!irq_disabled) {
> > +             WARN_ONCE(1, "HLT instruction emulation unsafe with irqs enabled\n");
> > +             return -EIO;
> > +     }
>
> The warning is fine, but I do think it should be separated from the bug fix.
>
> >
> > -void __cpuidle tdx_safe_halt(void)
> > +void __cpuidle tdx_idle(void)
> >  {
> >       const bool irq_disabled = false;
> >
> > @@ -397,6 +403,12 @@ void __cpuidle tdx_safe_halt(void)
> >               WARN_ONCE(1, "HLT instruction emulation failed\n");
> >  }
> >
> > +static void __cpuidle tdx_safe_halt(void)
> > +{
> > +     tdx_idle();
> > +     raw_local_irq_enable();
> > +}
>
> The naming here is a bit wonky. Think of how the call chain will look:
>
>         irq.safe_halt() =>
>         tdx_safe_halt() =>
>         tdx_idle()      =>
>         __halt()
>
> See how it's doing a more and more TDX-specific halt operation? Isn't
> the "idle" call right in the middle confusing?
Makes sense.

>
> >  static int read_msr(struct pt_regs *regs, struct ve_info *ve)
> >  {
> >       struct tdx_module_args args = {
> > @@ -1083,6 +1095,15 @@ void __init tdx_early_init(void)
> >       x86_platform.guest.enc_kexec_begin           = tdx_kexec_begin;
> >       x86_platform.guest.enc_kexec_finish          = tdx_kexec_finish;
> >
> > +#ifdef CONFIG_PARAVIRT_XXL
> > +     /*
> > +      * halt instruction execution is not atomic for TDX VMs as it generates
> > +      * #VEs, so otherwise "safe" halt invocations which cause interrupts to
> > +      * get enabled right after halt instruction don't work for TDX VMs.
> > +      */
> > +     pv_ops.irq.safe_halt = tdx_safe_halt;
> > +#endif
>
> Just like the changelog, it's hard to write a good comment without going
> into the horrors of the STI-shadow. But I think this is a bit more to
> the point:
>
>         /*
>          * Avoid the literal hlt instruction in TDX guests. hlt will
>          * induce a #VE in the STI-shadow which will enable interrupts
>          * in a place where they are not wanted.
>          */
>
>

Ack, will take care of these comments in v3.