On Tue, Nov 21, 2023 at 05:21:49PM +0100, Borislav Petkov wrote: > On Mon, Oct 16, 2023 at 08:27:40AM -0500, Michael Roth wrote: > > +static int rmpupdate(u64 pfn, struct rmp_state *val) > > rmp_state *state > > so that it is clear what this is. > > > +{ > > + unsigned long paddr = pfn << PAGE_SHIFT; > > + int ret, level, npages; > > + int attempts = 0; > > + > > + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP)) > > + return -ENXIO; > > + > > + do { > > + /* Binutils version 2.36 supports the RMPUPDATE mnemonic. */ > > + asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE" > > + : "=a"(ret) > > + : "a"(paddr), "c"((unsigned long)val) > > Add an empty space between the " and the ( > > > + : "memory", "cc"); > > + > > + attempts++; > > + } while (ret == RMPUPDATE_FAIL_OVERLAP); > > What's the logic here? Loop as long as it says "overlap"? > > How "transient" is that overlapping condition? > > What's the upper limit of that loop? > > This loop should check a generously chosen upper limit of attempts and > then break if that limit is reached. We've raised similar questions to David Kaplan and discussed this to a fair degree. The transient condition here is due to firmware locking the 2MB-aligned RMP entry for the range to handle atomic updates. There is no upper bound on retries or the amount of time spent, but it is always transient since multiple hypervisor implementations now depend on this and any deviation from this assurance would constitute a firmware regression. A good torture test for this path is lots of 4K-only guests doing concurrent boot/shutdowns in a tight loop. With week-long runs the longest delay seen was on the order of 100ns, but there's no real correlation between time spent and number of retries, sometimes 100ns delays only involve 1 retry, sometimes much smaller time delays involve hundreds of retries, and it all depends on what firmware is doing, so there's no way to infer a safe retry limit based on that data. All that said, there are unfortunately other conditions that can trigger non-transient RMPUPDATE_FAIL_OVERLAP failures, and these will result in an infinite loop. Those are the result of host misbehavior however, like trying to set up 2MB private RMP entries when there are already private 4K entries in the range. Ideally these would be separate error codes, but even if that were changed in firmware we'd still need code to support older firmwares that don't disambiguate so not sure this situation can be improved much. > > > + if (ret) { > > + pr_err("RMPUPDATE failed after %d attempts, ret: %d, pfn: %llx, npages: %d, level: %d\n", > > + attempts, ret, pfn, npages, level); > > You're dumping here uninitialized stack variables npages and level. > Looks like leftover from some prior version of this function. Yah, I'll clean this up. I think logging the attempts probably doesn't have much use anymore either. > > > + sev_dump_rmpentry(pfn); > > + dump_stack(); > > This is going to become real noisy on a huge machine with a lot of SNP > guests. Since the transient case will eventually resolve to ret==0, we will only get here on a kernel oops sort of condition where a stack dump seems appropriate. rmpupdate() shouldn't error during normal operation, and if it ever does it will likely be a fatal situation where those stack dumps will be useful. Thanks, Mike > > Thx. > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette >