Re: [PATCH v2] x86: Skip WBINVD instruction for VM guest

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Fri, 03 Dec 2021 00:48:43 +0100

Kirill,

On Fri, Dec 03 2021 at 01:21, Kirill A. Shutemov wrote:
> On Thu, Nov 25, 2021 at 01:40:24AM +0100, Thomas Gleixner wrote:
>> Kuppuswamy,
>> Either that or you provide patches with arguments which are based on
>> proper analysis and not on 'appears to' observations.
>
> I think the right solution to the WBINVD would be to add a #VE handler
> that does nothing. We don't have a reasonable way to handle it from within
> the guest. We can call the VMM in hope that it would handle it, but VMM is
> untrusted and it can ignore the request.
>
> Dave suggested that we need to do code audit to make sure that there's no
> user inside TDX guest environment that relies on WBINVD to work correctly.
>
> Below is full call tree of WBINVD. It is substantially larger than I
> anticipated from initial grep.
>
> Conclusions:
>
>   - Most of callers are in ACPI code on changing S-states. Ignoring cache
>     flush for S-state change on virtual machine should be safe.
>
>   - The only WBINVD I was able to trigger is on poweroff from ACPI code.
>     Reboot also should trigger it, but for some reason I don't see it.
>
>   - Few caller in CPU offline code. TDX does not allowed to offline CPU as
>     we cannot bring it back -- we don't have SIPI. And even if offline
>     works for vCPU it should be safe to ignore WBINVD there.
>
>   - NVDIMMs are not supported inside TDX. If it will change we would need
>     to deal with cache flushing for this case. Hopefully, we would be able
>     to avoid WBINVD.
>
>   - Cache QoS and MTRR use WBINVD. They are disabled in TDX, but it is
>     controlled by VMM if the feature is advertised. We would need to
>     filter CPUID/MSRs to make sure VMM would not mess with them.
>
> Is it good enough justification for do-nothing #VE WBINVD handler?

first of all thank you very much for this very profound analysis.

This is really what I was asking for and you probably went even a step
deeper than that. Very appreciated.

What we should do instead of doing a wholesale let's ignore WBINVD is to
have a separate function/macro:

 ACPI_FLUSH_CPU_CACHE_PHYS()

and invoke that from the functions which are considered to be safe.

That would default to ACPI_FLUSH_CPU_CACHE() for other architecures
obviously.

Then you can rightfully do:

#define ACPI_FLUSH_CPU_CACHE_PHYS()     \
        if (!cpu_feature_enabled(XXX))	\
        	wbinvd();               \              

where $XXX might be FEATURE_TDX_GUEST for paranoia sake and then
extended to X86_FEATURE_HYPERVISOR if everyone agrees.

Then you have the #VE handler which just acts on any other wbinvd
invocation via warn, panic, whatever, no?

Thanks,

        tglx