On Mon, May 08, 2023, Yan Zhao wrote: > On Thu, May 04, 2023 at 10:17:20AM +0800, Yan Zhao wrote: > > On Wed, May 03, 2023 at 04:16:10PM -0700, Sean Christopherson wrote: > > > Finally getting back to this series... > > > > > > On Thu, Mar 23, 2023, Yan Zhao wrote: > > > > On Fri, Mar 17, 2023 at 04:28:56PM +0800, Yan Zhao wrote: > > > > > On Fri, Mar 10, 2023 at 04:22:56PM -0800, Sean Christopherson wrote: > > > > > ... > > > > > > +int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn) > > > > > > +{ > > > > > > + struct kvm_memory_slot *slot; > > > > > > + int idx; > > > > > > + > > > > > > + idx = srcu_read_lock(&kvm->srcu); > > > > > > + > > > > > > + slot = gfn_to_memslot(kvm, gfn); > > > > > > + if (!slot) { > > > > > > + srcu_read_unlock(&kvm->srcu, idx); > > > > > > + return -EINVAL; > > > > > > + } > > > > > > + > > > > > Also fail if slot->flags & KVM_MEMSLOT_INVALID is true? > > > > > There should exist a window for external users to see an invalid slot > > > > > when a slot is about to get deleted/moved. > > > > > (It happens before MOVE is rejected in kvm_arch_prepare_memory_region()). > > > > > > > > Or using > > > > if (!kvm_is_visible_memslot(slot)) { > > > > srcu_read_unlock(&kvm->srcu, idx); > > > > return -EINVAL; > > > > } > > > > Hi Sean, > After more thoughts, do you think checking KVM internal memslot is necessary? I don't think it's necessary per se, but I also can't think of any reason to allow it. > slot = gfn_to_memslot(kvm, gfn); > if (!slot || slot->id >= KVM_USER_MEM_SLOTS) { > srcu_read_unlock(&kvm->srcu, idx); > return -EINVAL; > } > > Do we allow write tracking to APIC access page when APIC-write VM exit > is not desired? Allow? Yes. But KVM doesn't use write-tracking for anything APICv related, e.g. to disable APICv, KVM instead zaps the SPTEs for the APIC access page and on page fault goes straight to MMIO emulation. Theoretically, the guest could create an intermediate PTE in the APIC access page and AFAICT KVM would shadow the access and write-protect the APIC access page. But that's benign as the resulting emulation would be handled just like emulated APIC MMIO. FWIW, the other internal memslots, TSS and idenity mapped page tables, are used if and only if paging is disabled in the guest, i.e. there are no guest PTEs for KVM to shadow (and paging must be enabled to enable VMX, so nested EPT is also ruled out). So this is theoretically possible only for the APIC access page. That changes with KVMGT, but that again should not be problematic. KVM will emulate in response to the write-protected page and things go on. E.g. it's arguably much weirder that the guest can read/write the identity mapped page tables that are used for EPT without unrestricted guest. There's no sane reason to allow creating PTEs in the APIC page, but I'm also not all that motivated to "fix" things. account_shadowed() isn't expected to fail, so KVM would need to check further up the stack, e.g. in walk_addr_generic() by open coding a form of kvm_vcpu_gfn_to_hva_prot(). I _think_ that's the only place KVM would need to add a check, as KVM already checks that the root, i.e. CR3, is in a "visible" memslot. I suppose KVM could just synthesize triple fault, like it does for the root/CR3 case, but I don't like making up behavior. In other words, I'm not opposed to disallowing write-tracking internal memslots, but I can't think of anything that will break, and so for me personally at least, the ROI isn't sufficient to justify writing tests and dealing with any fallout.