Re: [PATCH v5 11/17] KVM: x86: Enable KVM_CAP_USERFAULT_ON_MISSING

Sean Christopherson <seanjc@xxxxxxxxxx> · Thu, 2 Nov 2023 07:31:27 -0700

On Wed, Nov 01, 2023, Anish Moorthy wrote:
> On Wed, Oct 4, 2023 at 6:52 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >
> > On Fri, Sep 08, 2023, Anish Moorthy wrote:
> > > The relevant __gfn_to_pfn_memslot() calls in __kvm_faultin_pfn()
> > > already use MEMSLOT_ACCESS_NONATOMIC_MAY_UPGRADE.
> >
> > --verbose
> 
> Alright, how about the following?
> 
>     KVM: x86: Enable KVM_CAP_EXIT_ON_MISSING
> 
>     __gfn_to_pfn_memslot(), used by the stage-2 fault handler, already
>     checks the memslot flag to avoid faulting in missing pages as required.
>     Therefore, enabling the capability is just a matter of selecting the kconfig
>     to allow the flag to actually be set.
> 
> > Hmm, I vote to squash this patch with
> >
> >   KVM: x86: Annotate -EFAULTs from kvm_handle_error_pfn()
> >
> > or rather, squash that code into this patch.  Ditto for the ARM patches.
> >
> > Since we're making KVM_EXIT_MEMORY_FAULT informational-only for flows that KVM
> > isn't committing to supporting, I think it makes sense to enable the arch specific
> > paths that *need* to return KVM_EXIT_MEMORY_FAULT at the same time as the feature
> > that adds the requirement.
> >
> > E.g. without HAVE_KVM_USERFAULT_ON_MISSING support, per the contract we are creating,
> > it would be totally fine for KVM to not use KVM_EXIT_MEMORY_FAULT for the page
> > fault paths.  And that obviously has to be the case since KVM_CAP_MEMORY_FAULT_INFO
> > is introduced before the arch code is enabled.
> >
> > But as of this path, KVM *must* return KVM_EXIT_MEMORY_FAULT, so we should make
> > it impossible to do a straight revert of that dependency.
> 
> Should we really be concerned about people reverting the
> KVM_CAP_MEMORY_FAULT_INFO commit(s) in this way?

Yes.  A revert is highly unlikely, but possible.  Keep in mind that with upstream
KVM, there are a *lot* of end users that don't know the inner workings of KVM
(or the kernel).  When things break, the standard approach for such users is to
bisect to find where things first broke, and then to try reverting the suspected
commit to see if that fixes the problem.

In the (again, highly unlikely) case that filling run->memory_fault breaks something,
an unsuspecting user will bisect to that commit and revert.  Then they're suddenly
in a situation where they've unknowing broken KVM, and likely in a way that won't
immediately fail.

*Nothing* in either changelog gives any clue that there is a hard dependency.
Even the slightly more verbose version above provides almost no indication of the
real dependency, as it primarily talks about __gfn_to_pfn_memslot() and
KVM_CAP_EXIT_ON_MISSING.

And now that we've punted on trying annotate "everything", there's no sane way
for the "Annotate -EFAULTs from kvm_handle_error_pfn()" changelog to communicate
that it will have a hard dependency in the future.  If the changelog says "this
will be used by blah blah blah", then it raises the question of "well why isn't
this added there?".

And the patches are tiny.  Having a final "enable and advertise XYZ" patch is
relatively common for new features, but that's often because the enabling of the
new feature is spread across multiple patches.  E.g. see the LAM support sitting
in "https://github.com/kvm-x86/linux.git lam".  And in those cases, the hard
dependency is always very obvious, e.g. if someone complained that reverting
"Virtualize LAM for user pointer" but not "Advertise and enable LAM (user and
supervisor)" caused problems, we'd be like, "well, yeah".

> I see what you're> saying- but it also seems to me that KVM could add other
> things akin to KVM_CAP_EXIT_ON_MISSING in the future, and then we end up in
> the exact same situation.

No, it's not the exact same situation.

First and foremost, we *can't* solve that problem, because some future feature
doesn't exist yet.

Second, merging features into two different kernel releases creates a very different
situation.  Let's say this goes into kernel 6.9, and then some future feature
lands in kernel 6.11 and has a hard dependency on the annotations.  The odds of
needing to revert a patch from 6.9 while upgrading to 6.11 are significnatly lower
than the odds of needing to revert a patch from 6.9-rc1 between when rc1 is released
and a user upgrades to 6.9.  And users aren't stupid; they might not know the inner
workings of KVM, but bisecting to a patch from 6.9 when upgrading to 6.11 would
give them pause.

Third, with the patches squashed, to revert the annotations, a person would also
be reverting _this_ patch (because they'd be one and the same).  At that point,
they're no longer reverting a seemingly innocous "give userspace more info" commit,
they're reverting something that clearly advertises a feature userspace, which
again provides a clue that a straight revert might not be super safe.

> Sure the squash might make sense for the specific commits in the series, but
> there's a general issue that isn't really being solved.

Patches within a series and future series are two very different things.

> Maybe I'm just letting the better be the enemy of the best,

The "best" isn't even possible, unless we never ship anything, ever.

> but I do like the current separation/single-focus of the commits.

Because you already know what the entire series does.  Someone looking at this
patch without already understanding the big picture is going to have no idea that
these two patches are tightly coupled.

And again, now that we've punted on annotating everything, I see zero reason to
split the patches.  Maybe you could argue it provides a new bisection point, but
again the patches are _tiny_, and that same bisection point is effectively achieved
by running with an "older" userspace, i.e. a userspace that doesn't utilize the
KVM_MEM_EXIT_ON_MISSING, which is literally every userspace in existence right now.

> That said, the squash is nbd and I can go ahead if you're not convinced.