On 3/29/2022 2:30 AM, Sean Christopherson wrote: > On Tue, Mar 08, 2022, Nikunj A Dadhania wrote: >> This is a follow-up to the RFC implementation [1] that incorporates >> review feedback and bug fixes. See the "RFC v1" section below for a >> list of changes. > > Heh, for future reference, the initial posting of a series/patch/RFC is implicitly > v1, i.e. this should be RFC v2. Sure. > >> SEV guest requires the guest's pages to be pinned in host physical >> memory as migration of encrypted pages is not supported. The memory >> encryption scheme uses the physical address of the memory being >> encrypted. If guest pages are moved by the host, content decrypted in >> the guest would be incorrect thereby corrupting guest's memory. >> >> For SEV/SEV-ES guests, the hypervisor doesn't know which pages are >> encrypted and when the guest is done using those pages. Hypervisor >> should treat all the guest pages as encrypted until they are >> deallocated or the guest is destroyed. >> >> While provision a pfn, make KVM aware that guest pages need to be >> pinned for long-term and use appropriate pin_user_pages API for these >> special encrypted memory regions. KVM takes the first reference and >> holds it until a mapping is done. Take an extra reference before KVM >> releases the pfn. >> >> Actual pinning management is handled by vendor code via new >> kvm_x86_ops hooks. MMU calls in to vendor code to pin the page on >> demand. Metadata of the pinning is stored in architecture specific >> memslot area. During the memslot freeing path and deallocation path >> guest pages are unpinned. >> >> Guest boot time comparison: >> +---------------+----------------+-------------------+ >> | Guest Memory | baseline | Demand Pinning + | >> | Size (GB) | v5.17-rc6(secs)| v5.17-rc6(secs) | >> +---------------+----------------+-------------------+ >> | 4 | 6.16 | 5.71 | >> +---------------+----------------+-------------------+ >> | 16 | 7.38 | 5.91 | >> +---------------+----------------+-------------------+ >> | 64 | 12.17 | 6.16 | >> +---------------+----------------+-------------------+ >> | 128 | 18.20 | 6.50 | >> +---------------+----------------+-------------------+ >> | 192 | 24.56 | 6.80 | >> +---------------+----------------+-------------------+ > > Let me preface this by saying I generally like the idea and especially the > performance, but... > > I think we should abandon this approach in favor of committing all our resources > to fd-based private memory[*], which (if done right) will provide on-demand pinning > for "free". I will give this a try for SEV, was on my todo list. > I would much rather get that support merged sooner than later, and use > it as a carrot for legacy SEV to get users to move over to its new APIs, with a long > term goal of deprecating and disallowing SEV/SEV-ES guests without fd-based private > memory. > That would require guest kernel support to communicate private vs. shared, Could you explain this in more detail? This is required for punching hole for shared pages? > but SEV guests already "need" to do that to play nice with live migration, so it's > not a big ask, just another carrot to entice guests/customers to update their kernel > (and possibly users to update their guest firmware). > > This series isn't awful by any means, but it requires poking into core flows and > further complicates paths that are already anything but simple. And things like > conditionally grabbing vCPU0 to pin pages in its MMU make me flinch. And I think > the situation would only get worse by the time all the bugs and corner cases are > ironed out. E.g. this code is wrong: > > void kvm_release_pfn_clean(kvm_pfn_t pfn) > { > if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn)) { > struct page *page = pfn_to_page(pfn); > > if (page_maybe_dma_pinned(page)) > unpin_user_page(page); > else > put_page(page); > } > } > EXPORT_SYMBOL_GPL(kvm_release_pfn_clean); > > Because (a) page_maybe_dma_pinned() is susceptible to false positives (clearly > documented), and (b) even if it didn't get false positives, there's no guarantee > that _KVM_ owns a pin of the page. Right, the pinning could have been done by some other subsystem. > > It's not an impossible problem to solve, but I suspect any solution will require > either touching a lot of code or will be fragile and difficult to maintain, e.g. > by auditing all users to understand which need to pin and which don't. Even if > we _always_ pin memory for SEV guests, we'd still need to plumb the "is SEV guest" > info around. > > And FWIW, my years-old idea of using a software-available SPTE bit to track pinned > pages is plagued by the same underlying issue: KVM's current management (or lack > thereof) of SEV guest memory just isn't viable long term. In all honesty, it > probably should never have been merged. We can't change the past, but we can, > and IMO should, avoid piling on more code to an approach that is fundamentally flawed. > > [*] https://lore.kernel.org/all/20220310140911.50924-1-chao.p.peng@xxxxxxxxxxxxxxx > Thanks for the valuable feedback. Regards Nikunj