Re: [PATCH v3 00/22] Improve scalability of KVM + userfaultfd live migration via annotated memory faults.

Sean Christopherson <seanjc@xxxxxxxxxx> · Mon, 24 Apr 2023 17:26:52 -0700

On Mon, Apr 24, 2023, Nadav Amit wrote:
> Feel free to tell me to shut up, as it is none of my business, and I might be
> missing a lot of context. Yet, it never stopped me before. :)
> 
> > On Apr 24, 2023, at 1:35 PM, Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > 
> > On Mon, Apr 24, 2023, Nadav Amit wrote:
> >> 
> >>> On Apr 24, 2023, at 10:54 AM, Anish Moorthy <amoorthy@xxxxxxxxxx> wrote:
> >>> Sean did mention that he wanted KVM_CAP_MEMORY_FAULT_INFO in general,
> >>> so I'm guessing (some version of) that will (eventually :) be merged
> >>> in any case.
> >> 
> >> It certainly not my call. But if you ask me, introducing a solution for
> >> a concrete use-case that requires API changes/enhancements is not
> >> guaranteed to be the best solution. It may be better first to fully
> >> understand the existing overheads and agree that there is no alternative
> >> cleaner and more general solution with similar performance.
> > 
> > KVM already returns -EFAULT for these situations, the change I really want to land
> > is to have KVM report detailed information about why the -EFAULT occurred.  I'll be
> > happy to carry the code in KVM even if userspace never does anything beyond dumping
> > the extra information on failures.
> 
> I thought that the change is to inform on page-faults through a new interface
> instead of the existing uffd-file-based one. There is already another interface
> (signals) and I thought (but did not upstream) io-uring. You now suggest yet
> another one.

There are two capabilities proposed in this series: one to provide more information
when KVM encounters a fault it can't resolve, and another to tell KVM to kick out
to userspace instead of attempting to resolve a fault when a given address couldn't
be resolved with fast gup.  I'm talking purely about the first one: providing more
information when KVM exits.

As for the second, my plan is to try and stay out of the way and let people that
actually deal with the userspace side of things settle on an approach.  From the
KVM side, supporting the "don't wait to resolve faults" flag is quite simple so
long as KVM punts the heavy lifting to userspace, e.g. identifying _why_ the address
isn't mapped with the appropriate permissions.

> I am not sure it is very clean. IIUC, it means that you would still need in
> userspace to monitor uffd, as qemu (or whatever-kvm-userspace-counterpart-you-
> use) might also trigger a page-fault. So userspace becomes more complicated,
> and the ordering of different events/page-faults reporting becomes even more
> broken.
> 
> At the same time, you also break various assumptions of userfaultfd. I don’t
> immediately find some functionality that would break, but I am not very
> confident about it either.
> 
> > 
> >> Considering the mess that KVM async-PF introduced, I would be very careful
> >> before introducing such API changes. I did not look too much on the details,
> >> but some things anyhow look slightly strange (which might be since I am
> >> out-of-touch with KVM). For instance, returning -EFAULT on from KVM_RUN? I
> >> would have assumed -EAGAIN would be more appropriate since the invocation did
> >> succeed.
> > 
> > Yeah, returning -EFAULT is somewhat odd, but as above, that's pre-existing
> > behavior that's been around for many years.
> 
> Again, it is none of my business, but don’t you want to gradually try to fix
> the interface so maybe on day you would be able to deprecate it?
>
> IOW, if you already introduce a new interface which is enabled with a new
> flag, which would require userspace change, then you can return the more
> appropriate error-code.

In a perfect world, yes.  But unfortunately, the relevant plumbing in KVM is quite
brittle (understatement) with respect to returning "0", and AFAICT, returning
-EFAULT instead of 0 is nothing more than an odditity.  E.g. at worst, it could be
suprising to users writing a new VMM from scratch.

But I hadn't thought about returning a _different_ error code.  -EAGAIN isn't
obviously better though, e.g. my understanding is that -EAGAIN typically means that
retrying will succeed, but in pretty much every case where KVM returns -EFAULT,
simply trying again will never succeed.  It's not even guaranteed to be appropriate
in the "don't wait to resolve faults" case, because KVM won't actually know why
an address isn't accessible, e.g. it could be because the page needs to be faulted
in, but it could also be a due to a guest bug that resulted in a permission
violation.

All in all, returning -EFAULT doesn't seem egregious.  I can't recall a single complaint
about returning -EFAULT instead of -XYZ, just complaints about not KVM providing any
useful information.  So absent a concrete need for returning something other than
-EFAULT, I'm definitely inclined to maintain the status quo, even though it's imperfect.