On Thu, Jan 27, 2022 at 8:04 AM Maxim Levitsky <mlevitsk@xxxxxxxxxx> wrote: > > I would like to raise a question about this elephant in the room which I wanted to understand for > quite a long time. > > For my nested AVIC work I once again need to change the KVM_REQ_GET_NESTED_STATE_PAGES code and once > again I am asking myself, maybe we can get rid of this code, after all? We (GCE) use it so that, during post-copy, a vCPU thread can exit to userspace and demand these pages from the source itself, rather than funneling all demands through a single "demand paging listener" thread, which I believe is the equivalent of qemu's userfaultfd "fault handler" thread. Our (internal) post-copy mechanism scales quite well, because most demand paging requests are triggered by an EPT violation, which happens to be a convenient place to exit to userspace. Very few pages are typically demanded as a result of kvm_vcpu_{read,write}_guest, where the vCPU thread is so deep in the kernel call stack that it has to request the page via the demand paging listener thread. With nested virtualization, the various vmcs12 pages consulted directly by kvm (bypassing the EPT tables) were a scalability issue. (Note that, unlike upstream, we don't call nested_get_vmcs12_pages directly from VMLAUNCH/VMRESUME emulation; we always call it as a result of this request that you don't like.) As we work on converting from our (hacky) demand paging scheme to userfaultfd, we will have to solve the scalability issue anyway (unless someone else beats us to it). Eventually, I expect that our need for this request will go away. Honestly, without the exits to userspace, I don't really see how this request buys you anything upstream. When I originally submitted it, I was prepared for rejection, but Paolo said that qemu had a similar need for it, and I happily never questioned that assertion.