Re: [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap

Michael Roth <michael.roth@xxxxxxx> · Mon, 19 Sep 2022 12:56:15 -0500

On Wed, Sep 14, 2022 at 08:05:49AM +0000, Sean Christopherson wrote:
> On Thu, Sep 08, 2022, Michael Roth wrote:
> > On Fri, Oct 15, 2021 at 05:16:28PM +0000, Sean Christopherson wrote:
> > So in the context of this interim solution, we're trying to look for a
> > solution that's simple enough that it can be used reliably, without
> > introducing too much additional complexity into KVM. There is one
> > approach that seems to fit that bill, that Brijesh attempted in an
> > earlier version of this series (I'm not sure what exactly was the
> > catalyst to changing the approach, as I wasn't really in the loop at
> > the time, but AIUI there weren't any showstoppers there, but please
> > correct me if I'm missing anything):
> > 
> >  - if the host is writing to a page that it thinks is supposed to be
> >    shared, and the guest switches it to private, we get an RMP fault
> >    (actually, we will get a !PRESENT fault, since as of v5 we now
> >    remove the mapping from the directmap as part of conversion)
> >  - in the host #PF handler, if we see that the page is marked private
> >    in the RMP table, simply switch it back to shared
> >  - if this was a bug on the part of the host, then the guest will see

Hi Sean,

Thanks for the input here and at KVM Forum.

> 
> As discussed off-list, attempting to fix up RMP violations in the host #PF handler
> is not a viable approach.  There was also extensive discussion on-list a while back:
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F8a244d34-2b10-4cf8-894a-1bf12b59cf92%40www.fastmail.com&amp;data=05%7C01%7Cmichael.roth%40amd.com%7C2f2356ebe2b44daab93708da9627f2b4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637987395629620130%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=Mm13HgUAE4M%2BluyBys3Ihp%2FTNqSQTq14WrMXdF8ArAw%3D&amp;reserved=0

I think that was likely the only hope for a non-UPM approach, as
anything else would require a good bit of infrastructure in KVM and
elsewhere to avoid that situation occuring to begin with, and it
probably would not be worth the effort outside the context of a
general/platform-independent solution like UPM. I was hoping it would be
possible to work through Andy's concerns, but the concerns you and Paolo
raised of potential surprise #PFs in other parts of the kernel are
something I'm less optimistic about, so I agree UPM is probably the right
place to focus efforts.

> 
> > AIUI, this is still sort of an open question, but you noted how nuking
> > the directmap without any formalized interface for letting the kernel
> > know about it could be problematic down the road, which also sounds
> > like the sort of thing more suited for having UPM address at a more
> > general level, since there are similar requirements for TDX as well.
> > 
> > AIUI there are 2 main arguments against splitting the directmap:
> >  a) we can't easily rebuild it atm
> >  b) things like KSM might still tries to access private pages
> > 
> > But unmapping also suffers from a), since we still end up splitting the
> > directmap unless we remove pages in blocks of 2M.
> 
> But for UPM, it's easy to rebuild the direct map since there will be an explicit,
> kernel controlled point where the "inaccesible" memfd releases the private page.

I was thinking it would be possible to do something similar by doing page
splitting/restore in bulk as part of MEM_ENCRYPT_{REG,UNREG}_REGION, but
yes UPM also allows for a convenient point in time to split/unsplit.

> 
> > But nothing prevents a guest from switching a single 4K page to private, in
> > which case we are forced to split. That would be normal behavior on the part
> > of the guest for setting up GHCB pages/etc, so we still end up splitting the
> > directmap over time.
> 
> The host actually isn't _forced_ to split with UPM.  One option would be to refuse
> to split the direct map and instead force userspace to eat the 2mb allocation even
> though it only wants to map a single 4kb chunk into the guest.  I don't know that
> that's a _good_ option, but it is an option.

That does seem like a reasonable option. Maybe it also opens up a path
for hugetlbfs support of sorts. In practice I wouldn't expect too many
of those pages to be wasted, worst case would be 2MB per shared page in
the guest... I suppose that could add up for GHCB pages and whatnot if
there are lots of vCPUs, but at that point you're likely dealing with
large guests with enough memory to spare. Could be another pain point
regarding calculating appropriate memory limits for userspace though.

Thanks!

-Mike