Hi Oliver,
On 11/4/22 7:32 AM, Oliver Upton wrote:
On Mon, Oct 31, 2022 at 08:36:16AM +0800, Gavin Shan wrote:
ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
enabled. It's conflicting with that ring-based dirty page tracking always
requires a running VCPU context.
Introduce a new flavor of dirty ring that requires the use of both VCPU
dirty rings and a dirty bitmap. The expectation is that for non-VCPU
sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
the dirty bitmap. Userspace should scan the dirty bitmap before migrating
the VM to the target.
Use an additional capability to advertise this behavior. The newly added
capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
Whatever ordering requirements we settle on between these capabilities
needs to be documented as well.
[...]
It's mentioned in 'Documentation/virt/kvm/api.rst' as below.
After using the dirty rings, the userspace needs to detect the capability
of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
need to be backed by per-slot bitmaps. With this capability advertised
and supported, it means the architecture can dirty guest pages without
vcpu/ring context, so that some of the dirty information will still be
maintained in the bitmap structure.
The description may be not obvious about the ordering. For this, I can
add the following sentence at end of the section.
The capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP can't be enabled
until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL has been enabled.
@@ -4588,6 +4594,13 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
return -EINVAL;
return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
+ case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
+ if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
+ !kvm->dirty_ring_size)
I believe this ordering requirement is problematic, as it piles on top
of an existing problem w.r.t. KVM_CAP_DIRTY_LOG_RING v. memslot
creation.
Example:
- Enable KVM_CAP_DIRTY_LOG_RING
- Create some memslots w/ dirty logging enabled (note that the bitmap
is _not_ allocated)
- Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
- Save ITS tables and get a NULL dereference in
mark_page_dirty_in_slot():
if (vcpu && kvm->dirty_ring_size)
kvm_dirty_ring_push(&vcpu->dirty_ring,
slot, rel_gfn);
else
-------> set_bit_le(rel_gfn, memslot->dirty_bitmap);
Similarly, KVM may unnecessarily allocate bitmaps if dirty logging is
enabled on memslots before KVM_CAP_DIRTY_LOG_RING is enabled.
You could paper over this issue by disallowing DIRTY_RING_WITH_BITMAP if
DIRTY_LOG_RING has already been enabled, but the better approach would
be to explicitly check kvm_memslots_empty() such that the real
dependency is obvious. Peter, hadn't you mentioned something about
checking against memslots in an earlier revision?
The userspace (QEMU) needs to ensure that no dirty bitmap is created
before the capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is enabled.
It's unknown by QEMU that vgic/its is used when KVM_CAP_DIRTY_LOG_RING_ACQ_REL
is enabled.
kvm_initialization
enable_KVM_CAP_DIRTY_LOG_RING_ACQ_REL // Where KVM_CAP_DIRTY_LOG_RING is enabled
board_initialization // Where QEMU knows if vgic/its is used
add_memory_slots
kvm_post_initialization
enable_KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
:
start_migration
enable_dirty_page_tracking
create_dirty_bitmap // With KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP enabled
Thanks,
Gavin