Re: [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap

Brijesh Singh <brijesh.singh@xxxxxxx> · Fri, 15 Oct 2021 11:31:51 -0500

On 10/13/21 1:16 PM, Sean Christopherson wrote:
> On Wed, Oct 13, 2021, Sean Christopherson wrote:
>> On Fri, Aug 20, 2021, Brijesh Singh wrote:
>>> When SEV-SNP is enabled in the guest VM, the guest memory pages can
>>> either be a private or shared. A write from the hypervisor goes through
>>> the RMP checks. If hardware sees that hypervisor is attempting to write
>>> to a guest private page, then it triggers an RMP violation #PF.
>>>
>>> To avoid the RMP violation, add post_{map,unmap}_gfn() ops that can be
>>> used to verify that its safe to map a given guest page. Use the SRCU to
>>> protect against the page state change for existing mapped pages.
>> SRCU isn't protecting anything.  The synchronize_srcu_expedited() in the PSC code
>> forces it to wait for existing maps to go away, but it doesn't prevent new maps
>> from being created while the actual RMP updates are in-flight.  Most telling is
>> that the RMP updates happen _after_ the synchronize_srcu_expedited() call.
> Argh, another goof on my part.  Rereading prior feedback, I see that I loosely
> suggested SRCU as a possible solution.  That was a bad, bad suggestion.  I think
> (hope) I made it offhand without really thinking it through.  SRCU can't work in
> this case, because the whole premise of Read-Copy-Update is that there can be
> multiple copies of the data.  That simply can't be true for the RMP as hardware
> operates on a single table.
>
> In the future, please don't hesitate to push back on and/or question suggestions,
> especially those that are made without concrete examples, i.e. are likely off the
> cuff.  My goal isn't to set you up for failure :-/

What do you think about going back to my initial proposal of per-gfn
tracking [1] ? We can limit the changes to just for the kvm_vcpu_map()
and let the copy_to_user() take a fault and return an error (if it
attempt to write to guest private). If PSC happen while lock is held
then simplify return and let the guest retry PSC.

[1]
https://lore.kernel.org/lkml/20210707183616.5620-36-brijesh.singh@xxxxxxx/