Re: [PATCH v8 13/16] x86/virt/tdx: Configure global KeyID on all packages

Dave Hansen <dave.hansen@xxxxxxxxx> · Tue, 10 Jan 2023 08:53:06 -0800

On 1/10/23 02:15, Huang, Kai wrote:
> On Fri, 2023-01-06 at 14:49 -0800, Dave Hansen wrote:
>> On 12/8/22 22:52, Kai Huang wrote:
...
>>> + * Note:
>>> + *
>>> + * This function neither checks whether there's at least one online cpu
>>> + * for each package, nor explicitly prevents any cpu from going offline.
>>> + * If any package doesn't have any online cpu then the SEAMCALL won't be
>>> + * done on that package and the later step of TDX module initialization
>>> + * will fail.  The caller needs to guarantee this.
>>> + */
>>
>> *Does* the caller guarantee it?
>>
>> You're basically saying, "this code needs $FOO to work", but you're not
>> saying who *provides* $FOO.
> 
> In short, KVM can do something to guarantee but won't 100% guarantee this.
> 
> Specifically, KVM won't actively try to bring up cpu to guarantee this if
> there's any package has no online cpu at all (see the first lore link below).
> But KVM can _check_ whether this condition has been met before calling
> tdx_init() and speak out if not.  At the meantime, if the condition is met,
> refuse to offline the last cpu for each package (or any cpu) during module
> initialization.
> 
> And KVM needs similar handling anyway.  The reason is not only configuring the
> global KeyID has such requirement, creating/destroying TD (which involves
> programming/reclaiming one TDX KeyID) also require at least one online cpu for
> each package.
> 
> There were discussions around this on KVM how to handle.  IIUC the solution is
> KVM will:
> 1) fail to create TD if any package has no online cpu.
> 2) refuse to offline the last cpu for each package when there's any _active_ TDX
> guest running.
> 
> https://lore.kernel.org/lkml/20221102231911.3107438-1-seanjc@xxxxxxxxxx/T/#m1ff338686cfcb7ba691cd969acc17b32ff194073
> https://lore.kernel.org/lkml/de6b69781a6ba1fe65535f48db2677eef3ec6a83.1667110240.git.isaku.yamahata@xxxxxxxxx/
> 
> Thus TDX module initialization in KVM can be handled in similar way.
> 
> Btw, in v7 (which has per-lp init requirement on all cpus), tdx_init() does
> early check on whether all machine boot-time present cpu are online and simply
> returns error if condition is not met.  Here the difference is we don't have any
> check but depend on SEAMCALL to fail.  To me there's no fundamental difference.

So, I'm going to call shenanigans here.

You say:

	The caller needs to guarantee this.

Then, you go and tell us how the *ONE* caller of this function doesn't
actually guarantee this.  Plus, you *KNOW* this.

Those are shenanigans.

Let's do something like this instead of asking for something impossible
and pretending that the callers are going to provide some fantasy solution.

/*
 * Attempt to configure the global KeyID on all physical packages.
 *
 * This requires running code on at least one CPU in each package.  If a
 * package has no online CPUs, that code will not run and TDX module
 * initialization (TDH.whatever) will fail.
 *
 * This code takes no affirmative steps to online CPUs.  Callers (aka.
 * KVM) can ensure success by ensuring sufficient CPUs are online for
 * this to succeed.
 */

Now, since this _is_ all imperfect, what will our users see if this
house of cards falls down?  Will they get a nice error message like:

     TDX: failed to configure module, no online CPUs in package 12

Or, will they see:

     TDX: Hurr, durr, I'm confused and you should be too

?