Re: [PATCH v10 11/50] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

Michael Roth <michael.roth@xxxxxxx> · Tue, 19 Dec 2023 10:20:26 -0600

On Tue, Nov 21, 2023 at 05:21:49PM +0100, Borislav Petkov wrote:
> On Mon, Oct 16, 2023 at 08:27:40AM -0500, Michael Roth wrote:
> > +static int rmpupdate(u64 pfn, struct rmp_state *val)
> 
> rmp_state *state
> 
> so that it is clear what this is.
> 
> > +{
> > +	unsigned long paddr = pfn << PAGE_SHIFT;
> > +	int ret, level, npages;
> > +	int attempts = 0;
> > +
> > +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> > +		return -ENXIO;
> > +
> > +	do {
> > +		/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
> > +		asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
> > +			     : "=a"(ret)
> > +			     : "a"(paddr), "c"((unsigned long)val)
> 
> Add an empty space between the " and the (
> 
> > +			     : "memory", "cc");
> > +
> > +		attempts++;
> > +	} while (ret == RMPUPDATE_FAIL_OVERLAP);
> 
> What's the logic here? Loop as long as it says "overlap"?
> 
> How "transient" is that overlapping condition?
> 
> What's the upper limit of that loop?
> 
> This loop should check a generously chosen upper limit of attempts and
> then break if that limit is reached.

We've raised similar questions to David Kaplan and discussed this to a
fair degree.

The transient condition here is due to firmware locking the 2MB-aligned
RMP entry for the range to handle atomic updates. There is no upper bound
on retries or the amount of time spent, but it is always transient since
multiple hypervisor implementations now depend on this and any deviation
from this assurance would constitute a firmware regression.

A good torture test for this path is lots of 4K-only guests doing
concurrent boot/shutdowns in a tight loop. With week-long runs the
longest delay seen was on the order of 100ns, but there's no real 
correlation between time spent and number of retries, sometimes
100ns delays only involve 1 retry, sometimes much smaller time delays
involve hundreds of retries, and it all depends on what firmware is
doing, so there's no way to infer a safe retry limit based on that
data.

All that said, there are unfortunately other conditions that can
trigger non-transient RMPUPDATE_FAIL_OVERLAP failures, and these will
result in an infinite loop. Those are the result of host misbehavior
however, like trying to set up 2MB private RMP entries when there are
already private 4K entries in the range. Ideally these would be separate
error codes, but even if that were changed in firmware we'd still need
code to support older firmwares that don't disambiguate so not sure this
situation can be improved much.

> 
> > +	if (ret) {
> > +		pr_err("RMPUPDATE failed after %d attempts, ret: %d, pfn: %llx, npages: %d, level: %d\n",
> > +		       attempts, ret, pfn, npages, level);
> 
> You're dumping here uninitialized stack variables npages and level.
> Looks like leftover from some prior version of this function.

Yah, I'll clean this up. I think logging the attempts probably doesn't
have much use anymore either.

> 
> > +		sev_dump_rmpentry(pfn);
> > +		dump_stack();
> 
> This is going to become real noisy on a huge machine with a lot of SNP
> guests.

Since the transient case will eventually resolve to ret==0, we will only
get here on a kernel oops sort of condition where a stack dump seems
appropriate. rmpupdate() shouldn't error during normal operation, and if
it ever does it will likely be a fatal situation where those stack dumps
will be useful.

Thanks,

Mike

> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
>