On 8/3/20 12:16 PM, Sean Christopherson wrote: > On Mon, Aug 03, 2020 at 10:52:05AM -0500, Brijesh Singh wrote: >> Thanks for series Sean. Some thoughts >> >> >> On 7/31/20 4:23 PM, Sean Christopherson wrote: >>> SEV currently needs to pin guest memory as it doesn't support migrating >>> encrypted pages. Introduce a framework in KVM's MMU to support pinning >>> pages on demand without requiring additional memory allocations, and with >>> (somewhat hazy) line of sight toward supporting more advanced features for >>> encrypted guest memory, e.g. host page migration. >> >> Eric's attempt to do a lazy pinning suffers with the memory allocation >> problem and your series seems to address it. As you have noticed, >> currently the SEV enablement in the KVM does not support migrating the >> encrypted pages. But the recent SEV firmware provides a support to >> migrate the encrypted pages (e.g host page migration). The support is >> available in SEV FW >= 0.17. > I assume SEV also doesn't support ballooning? Ballooning would be a good > first step toward page migration as I think it'd be easier for KVM to > support, e.g. only needs to deal with the "zap" and not the "move". Yes, the ballooning does not work with the SEV. > >>> The idea is to use a software available bit in the SPTE to track that a >>> page has been pinned. The decision to pin a page and the actual pinning >>> managment is handled by vendor code via kvm_x86_ops hooks. There are >>> intentionally two hooks (zap and unzap) introduced that are not needed for >>> SEV. I included them to again show how the flag (probably renamed?) could >>> be used for more than just pin/unpin. >> If using the available software bits for the tracking the pinning is >> acceptable then it can be used for the non-SEV guests (if needed). I >> will look through your patch more carefully but one immediate question, >> when do we unpin the pages? In the case of the SEV, once a page is >> pinned then it should not be unpinned until the guest terminates. If we >> unpin the page before the VM terminates then there is a chance the host >> page migration will kick-in and move the pages. The KVM MMU code may >> call to drop the spte's during the zap/unzap and this happens a lot >> during a guest execution and it will lead us to the path where a vendor >> specific code will unpin the pages during the guest execution and cause >> a data corruption for the SEV guest. > The pages are unpinned by: > > drop_spte() > | > -> rmap_remove() > | > -> sev_drop_pinned_spte() > > > The intent is to allow unpinning pages when the mm_struct dies, i.e. when > the memory is no longer reachable (as opposed to when the last reference to > KVM is put), but typing that out, I realize there are dependencies and > assumptions that don't hold true for SEV as implemented. So, I tried this RFC with the SEV guest (of course after adding some of the stuff you highlighted below), the guest fails randomly. I have seen a two to three type of failures 1) boot 2) kernbench execution and 3) device addition/removal, the failure signature is not consistent. I believe after addressing some of the dependencies we may able to make some progress but it will add new restriction which did not existed before. > > - Parent shadow pages won't be zapped. Recycling MMU pages and zapping > all SPs due to memslot updates are the two concerns. > > The easy way out for recycling is to not recycle SPs with pinned > children, though that may or may not fly with VMM admins. > > I'm trying to resolve the memslot issue[*], but confirming that there's > no longer an issue with not zapping everything is proving difficult as > we haven't yet reproduced the original bug. > > - drop_large_spte() won't be invoked. I believe the only semi-legitimate > scenario is if the NX huge page workaround is toggled on while a VM is > running. Disallowing that if there is an SEV guest seems reasonable? > > There might be an issue with the host page size changing, but I don't > think that can happen if the page is pinned. That needs more > investigation. > > > [*] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2F20200703025047.13987-1-sean.j.christopherson%40intel.com&data=02%7C01%7Cbrijesh.singh%40amd.com%7C8d0dd94297ff4d24e54108d837d0f1dc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637320717832773086&sdata=yAHvMptxstoczXBZkFCpNC4AbADOJOgluwAtIYCuNVo%3D&reserved=0 > >>> Bugs in the core implementation are pretty much guaranteed. The basic >>> concept has been tested, but in a fairly different incarnation. Most >>> notably, tagging PRESENT SPTEs as PINNED has not been tested, although >>> using the PINNED flag to track zapped (and known to be pinned) SPTEs has >>> been tested. I cobbled this variation together fairly quickly to get the >>> code out there for discussion. >>> >>> The last patch to pin SEV pages during sev_launch_update_data() is >>> incomplete; it's there to show how we might leverage MMU-based pinning to >>> support pinning pages before the guest is live. >> >> I will add the SEV specific bits and give this a try. >> >>> Sean Christopherson (8): >>> KVM: x86/mmu: Return old SPTE from mmu_spte_clear_track_bits() >>> KVM: x86/mmu: Use bits 2:0 to check for present SPTEs >>> KVM: x86/mmu: Refactor handling of not-present SPTEs in mmu_set_spte() >>> KVM: x86/mmu: Add infrastructure for pinning PFNs on demand >>> KVM: SVM: Use the KVM MMU SPTE pinning hooks to pin pages on demand >>> KVM: x86/mmu: Move 'pfn' variable to caller of direct_page_fault() >>> KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by SEV >>> KVM: SVM: Pin SEV pages in MMU during sev_launch_update_data() >>> >>> arch/x86/include/asm/kvm_host.h | 7 ++ >>> arch/x86/kvm/mmu.h | 3 + >>> arch/x86/kvm/mmu/mmu.c | 186 +++++++++++++++++++++++++------- >>> arch/x86/kvm/mmu/paging_tmpl.h | 3 +- >>> arch/x86/kvm/svm/sev.c | 141 +++++++++++++++++++++++- >>> arch/x86/kvm/svm/svm.c | 3 + >>> arch/x86/kvm/svm/svm.h | 3 + >>> 7 files changed, 302 insertions(+), 44 deletions(-) >>>