On Fri, Oct 28, 2022, Gavin Shan wrote: > Hi Sean and Marc, > > On 10/28/22 2:30 AM, Marc Zyngier wrote: > > On Thu, 27 Oct 2022 18:44:51 +0100, > > Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > > > On Thu, Oct 27, 2022, Marc Zyngier wrote: > > > > On Tue, 25 Oct 2022 18:47:12 +0100, Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > [...] > > > > > > > > And ideally such bugs would detected without relying on userspace to > > > > > enabling dirty logging, e.g. the Hyper-V bug lurked for quite some > > > > > time and was only found when mark_page_dirty_in_slot() started > > > > > WARNing. > > > > > > > > > > I'm ok if arm64 wants to let userspace shoot itself in the foot with > > > > > the ITS, but I'm not ok dropping the protections in the common > > > > > mark_page_dirty_in_slot(). > > > > > > > > > > One somewhat gross idea would be to let architectures override the > > > > > "there must be a running vCPU" rule, e.g. arm64 could toggle a flag > > > > > in kvm->arch in its kvm_write_guest_lock() to note that an expected > > > > > write without a vCPU is in-progress: > > > > > > > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > > > > index 8c5c69ba47a7..d1da8914f749 100644 > > > > > --- a/virt/kvm/kvm_main.c > > > > > +++ b/virt/kvm/kvm_main.c > > > > > @@ -3297,7 +3297,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm, > > > > > struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); > > > > > #ifdef CONFIG_HAVE_KVM_DIRTY_RING > > > > > - if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm)) > > > > > + if (!kvm_arch_allow_write_without_running_vcpu(kvm) && WARN_ON_ONCE(!vcpu)) > > > > > + return; > > > > > + > > > > > + if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm)) > > > > > return; > > > > > #endif > > > > > @@ -3305,10 +3308,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm, > > > > > unsigned long rel_gfn = gfn - memslot->base_gfn; > > > > > u32 slot = (memslot->as_id << 16) | memslot->id; > > > > > - if (kvm->dirty_ring_size) > > > > > + if (kvm->dirty_ring_size && vcpu) > > > > > kvm_dirty_ring_push(&vcpu->dirty_ring, > > > > > slot, rel_gfn); > > > > > - else > > > > > + else if (memslot->dirty_bitmap) > > > > > set_bit_le(rel_gfn, memslot->dirty_bitmap); > > > > > } > > > > > } ... > > > A slightly different alternative would be have a completely separate > > > API for writing guest memory without an associated vCPU. I.e. start > > > building up proper device emulation support. Then the vCPU-based > > > APIs could yell if a vCPU isn't provided (or there is no running > > > vCPU in the current mess). And the deviced-based API could be > > > provided if and only if the architecture actually supports emulating > > > writes from devices, i.e. x86 would not opt-in and so would even > > > have access to the API. > > > > Which is what I was putting under the "major surgery" label in my > > previous email. > > > > Anyhow, for the purpose of unblocking Gavin's series, I suggest to > > adopt your per-arch opt-out suggestion as a stop gap measure, and we > > will then be able to bike-shed for weeks on what the shape of the > > device-originated memory dirtying API should be. > > > > It's really a 'major surgery' and I would like to make sure I fully understand > 'a completely separate API for writing guest memory without an associated vCPU", > before I'm going to working on v7 for this. > > There are 7 functions and 2 macros involved as below. I assume Sean is suggesting > to add another argument, whose name can be 'has_vcpu', for these functions and macros? No. As March suggested, for your series just implement the hacky arch opt-out, don't try and do surgery at this time as that's likely going to be a months-long effort that touches a lot of cross-arch code. E.g. I believe the ARM opt-out (opt-in?) for the above hack would be bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm) { return vgic_has_its(kvm); }