Hi Gavin, On Wed, 28 Sep 2022 00:47:43 +0100, Gavin Shan <gshan@xxxxxxxxxx> wrote: > I have rough idea as below. It's appreciated if you can comment before I'm > going a head for the prototype. The overall idea is to introduce another > dirty ring for KVM (kvm-dirty-ring). It's updated and visited separately > to dirty ring for vcpu (vcpu-dirty-ring). > > - When the various VGIC/ITS table base addresses are specified, kvm-dirty-ring > entries are added to mark those pages as 'always-dirty'. In mark_page_dirty_in_slot(), > those 'always-dirty' pages will be skipped, no entries pushed to vcpu-dirty-ring. > > - Similar to vcpu-dirty-ring, kvm-dirty-ring is accessed from userspace through > mmap(kvm->fd). However, there won't have similar reset interface. It means > 'struct kvm_dirty_gfn::flags' won't track any information as we do for > vcpu-dirty-ring. In this regard, kvm-dirty-ring is purely shared buffer to > advertise 'always-dirty' pages from host to userspace. > - For QEMU, shutdown/suspend/resume cases won't be concerning > us any more. The > only concerned case is migration. When the migration is about to complete, > kvm-dirty-ring entries are fetched and the dirty bits are updated to global > dirty page bitmap and RAMBlock's dirty page bitmap. For this, I'm still reading > the code to find the best spot to do it. I think it makes a lot of sense to have a way to log writes that are not generated by a vpcu, such as the GIC and maybe other things in the future, such as DMA traffic (some SMMUs are able to track dirty pages as well). However, I don't really see the point in inventing a new mechanism for that. Why don't we simply allow non-vpcu dirty pages to be tracked in the dirty *bitmap*? >From a kernel perspective, this is dead easy: diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 5b064dbadaf4..ae9138f29d51 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3305,7 +3305,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); #ifdef CONFIG_HAVE_KVM_DIRTY_RING - if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm)) + if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm)) return; #endif @@ -3313,10 +3313,11 @@ void mark_page_dirty_in_slot(struct kvm *kvm, unsigned long rel_gfn = gfn - memslot->base_gfn; u32 slot = (memslot->as_id << 16) | memslot->id; - if (kvm->dirty_ring_size) + if (vpcu && kvm->dirty_ring_size) kvm_dirty_ring_push(&vcpu->dirty_ring, slot, rel_gfn); - else + /* non-vpcu dirtying ends up in the global bitmap */ + if (!vcpu && memslot->dirty_bitmap) set_bit_le(rel_gfn, memslot->dirty_bitmap); } } though I'm sure there is a few more things to it. To me, this is just a relaxation of an arbitrary limitation, as the current assumption that only vcpus can dirty memory doesn't hold at all. Thanks, M. -- Without deviation from the norm, progress is not possible.