Hi Peter and Marc,
On 9/29/22 7:50 PM, Gavin Shan wrote:
On 9/29/22 12:52 AM, Peter Xu wrote:
On Wed, Sep 28, 2022 at 09:25:34AM +0100, Marc Zyngier wrote:
On Wed, 28 Sep 2022 00:47:43 +0100,
Gavin Shan <gshan@xxxxxxxxxx> wrote:
I have rough idea as below. It's appreciated if you can comment before I'm
going a head for the prototype. The overall idea is to introduce another
dirty ring for KVM (kvm-dirty-ring). It's updated and visited separately
to dirty ring for vcpu (vcpu-dirty-ring).
- When the various VGIC/ITS table base addresses are specified, kvm-dirty-ring
entries are added to mark those pages as 'always-dirty'. In mark_page_dirty_in_slot(),
those 'always-dirty' pages will be skipped, no entries pushed to vcpu-dirty-ring.
- Similar to vcpu-dirty-ring, kvm-dirty-ring is accessed from userspace through
mmap(kvm->fd). However, there won't have similar reset interface. It means
'struct kvm_dirty_gfn::flags' won't track any information as we do for
vcpu-dirty-ring. In this regard, kvm-dirty-ring is purely shared buffer to
advertise 'always-dirty' pages from host to userspace.
- For QEMU, shutdown/suspend/resume cases won't be concerning
us any more. The
only concerned case is migration. When the migration is about to complete,
kvm-dirty-ring entries are fetched and the dirty bits are updated to global
dirty page bitmap and RAMBlock's dirty page bitmap. For this, I'm still reading
the code to find the best spot to do it.
I think it makes a lot of sense to have a way to log writes that are
not generated by a vpcu, such as the GIC and maybe other things in the
future, such as DMA traffic (some SMMUs are able to track dirty pages
as well).
However, I don't really see the point in inventing a new mechanism for
that. Why don't we simply allow non-vpcu dirty pages to be tracked in
the dirty *bitmap*?
From a kernel perspective, this is dead easy:
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5b064dbadaf4..ae9138f29d51 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3305,7 +3305,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
#ifdef CONFIG_HAVE_KVM_DIRTY_RING
- if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
+ if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
return;
#endif
@@ -3313,10 +3313,11 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
unsigned long rel_gfn = gfn - memslot->base_gfn;
u32 slot = (memslot->as_id << 16) | memslot->id;
- if (kvm->dirty_ring_size)
+ if (vpcu && kvm->dirty_ring_size)
kvm_dirty_ring_push(&vcpu->dirty_ring,
slot, rel_gfn);
- else
+ /* non-vpcu dirtying ends up in the global bitmap */
+ if (!vcpu && memslot->dirty_bitmap)
set_bit_le(rel_gfn, memslot->dirty_bitmap);
}
}
though I'm sure there is a few more things to it.
Yes, currently the bitmaps are not created when rings are enabled.
kvm_prepare_memory_region() has:
else if (!kvm->dirty_ring_size) {
r = kvm_alloc_dirty_bitmap(new);
But I think maybe that's a solution worth considering. Using the rings
have a major challenge on the limitation of ring size, so that for e.g. an
ioctl we need to make sure the pages to dirty within an ioctl procedure
will not be more than the ring can take. Using dirty bitmap for a last
phase sync of constant (but still very small amount of) dirty pages does
sound reasonable and can avoid that complexity. The payoff is we'll need
to allocate both the rings and the bitmaps.
Ok. I was thinking of using the bitmap to convey the dirty pages for
this particular case, where we don't have running vcpu. The concern I had
is the natural difference between a ring and bitmap. The ring-buffer is
discrete, comparing to bitmap. Besides, it sounds a little strange to
have two different sets of meta-data to track the data (dirty pages).
However, bitmap is easier way than per-vm ring. The constrains with
per-vm ring is just as Peter pointed. So lets reuse the bitmap to
convey the dirty pages for this particular case. I think the payoff,
extra bitmap, is acceptable. For this, we need another capability
(KVM_CAP_DIRTY_LOG_RING_BITMAP?) so that QEMU can collects the dirty
bitmap in the last phase of migration.
If all of us agree on this, I can send another kernel patch to address
this. QEMU still need more patches so that the feature can be supported.
I've had the following PATCH[v5 3/7] to reuse bitmap for these particular
cases. KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG ioctls are used to visit
the bitmap. The new capability is advertised by KVM_CAP_DIRTY_LOG_RING_BITMAP.
Note those two ioctls are disabled when dirty-ring is enabled, we need to
enable them accordingly.
PATCH[v5 3/7] KVM: x86: Use bitmap in ring-based dirty page tracking
I would like to post v5 after someone reviews or acks kvm/selftests part
of this series.
[...]
Thanks,
Gavin