Hello All, On Tue, Dec 08, 2020 at 10:29:05AM -0600, Brijesh Singh wrote: > > On 12/7/20 9:09 PM, Steve Rutherford wrote: > > On Mon, Dec 7, 2020 at 12:42 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > >> On Sun, Dec 06, 2020, Paolo Bonzini wrote: > >>> On 03/12/20 01:34, Sean Christopherson wrote: > >>>> On Tue, Dec 01, 2020, Ashish Kalra wrote: > >>>>> From: Brijesh Singh <brijesh.singh@xxxxxxx> > >>>>> > >>>>> KVM hypercall framework relies on alternative framework to patch the > >>>>> VMCALL -> VMMCALL on AMD platform. If a hypercall is made before > >>>>> apply_alternative() is called then it defaults to VMCALL. The approach > >>>>> works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor > >>>>> will be able to decode the instruction and do the right things. But > >>>>> when SEV is active, guest memory is encrypted with guest key and > >>>>> hypervisor will not be able to decode the instruction bytes. > >>>>> > >>>>> Add SEV specific hypercall3, it unconditionally uses VMMCALL. The hypercall > >>>>> will be used by the SEV guest to notify encrypted pages to the hypervisor. > >>>> What if we invert KVM_HYPERCALL and X86_FEATURE_VMMCALL to default to VMMCALL > >>>> and opt into VMCALL? It's a synthetic feature flag either way, and I don't > >>>> think there are any existing KVM hypercalls that happen before alternatives are > >>>> patched, i.e. it'll be a nop for sane kernel builds. > >>>> > >>>> I'm also skeptical that a KVM specific hypercall is the right approach for the > >>>> encryption behavior, but I'll take that up in the patches later in the series. > >>> Do you think that it's the guest that should "donate" memory for the bitmap > >>> instead? > >> No. Two things I'd like to explore: > >> > >> 1. Making the hypercall to announce/request private vs. shared common across > >> hypervisors (KVM, Hyper-V, VMware, etc...) and technologies (SEV-* and TDX). > >> I'm concerned that we'll end up with multiple hypercalls that do more or > >> less the same thing, e.g. KVM+SEV, Hyper-V+SEV, TDX, etc... Maybe it's a > >> pipe dream, but I'd like to at least explore options before shoving in KVM- > >> only hypercalls. > >> > >> > >> 2. Tracking shared memory via a list of ranges instead of a using bitmap to > >> track all of guest memory. For most use cases, the vast majority of guest > >> memory will be private, most ranges will be 2mb+, and conversions between > >> private and shared will be uncommon events, i.e. the overhead to walk and > >> split/merge list entries is hopefully not a big concern. I suspect a list > >> would consume far less memory, hopefully without impacting performance. > > For a fancier data structure, I'd suggest an interval tree. Linux > > already has an rbtree-based interval tree implementation, which would > > likely work, and would probably assuage any performance concerns. > > > > Something like this would not be worth doing unless most of the shared > > pages were physically contiguous. A sample Ubuntu 20.04 VM on GCP had > > 60ish discontiguous shared regions. This is by no means a thorough > > search, but it's suggestive. If this is typical, then the bitmap would > > be far less efficient than most any interval-based data structure. > > > > You'd have to allow userspace to upper bound the number of intervals > > (similar to the maximum bitmap size), to prevent host OOMs due to > > malicious guests. There's something nice about the guest donating > > memory for this, since that would eliminate the OOM risk. > > > Tracking the list of ranges may not be bad idea, especially if we use > the some kind of rbtree-based data structure to update the ranges. It > will certainly be better than bitmap which grows based on the guest > memory size and as you guys see in the practice most of the pages will > be guest private. I am not sure if guest donating a memory will cover > all the cases, e.g what if we do a memory hotplug (increase the guest > ram from 2GB to 64GB), will donated memory range will be enough to store > the metadata. > >. With reference to internal discussions regarding the above, i am going to look into specific items as listed below : 1). "hypercall" related : a). Explore the SEV-SNP page change request structure (included in GHCB), see if there is something common there than can be re-used for SEV/SEV-ES page encryption status hypercalls. b). Explore if there is any common hypercall framework i can use in Linux/KVM. 2). related to the "backing" data structure - explore using a range-based list or something like rbtree-based interval tree data structure (as mentioned by Steve above) to replace the current bitmap based implementation. Thanks, Ashish