Hi Oliver,
On 11/4/22 9:06 AM, Oliver Upton wrote:
On Fri, Nov 04, 2022 at 08:12:21AM +0800, Gavin Shan wrote:
On 11/4/22 7:32 AM, Oliver Upton wrote:
On Mon, Oct 31, 2022 at 08:36:16AM +0800, Gavin Shan wrote:
ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
enabled. It's conflicting with that ring-based dirty page tracking always
requires a running VCPU context.
Introduce a new flavor of dirty ring that requires the use of both VCPU
dirty rings and a dirty bitmap. The expectation is that for non-VCPU
sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
the dirty bitmap. Userspace should scan the dirty bitmap before migrating
the VM to the target.
Use an additional capability to advertise this behavior. The newly added
capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
Whatever ordering requirements we settle on between these capabilities
needs to be documented as well.
[...]
It's mentioned in 'Documentation/virt/kvm/api.rst' as below.
After using the dirty rings, the userspace needs to detect the capability
of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
need to be backed by per-slot bitmaps. With this capability advertised
and supported, it means the architecture can dirty guest pages without
vcpu/ring context, so that some of the dirty information will still be
maintained in the bitmap structure.
The description may be not obvious about the ordering. For this, I can
add the following sentence at end of the section.
The capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP can't be enabled
until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL has been enabled.
@@ -4588,6 +4594,13 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
return -EINVAL;
return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
+ case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
+ if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
+ !kvm->dirty_ring_size)
I believe this ordering requirement is problematic, as it piles on top
of an existing problem w.r.t. KVM_CAP_DIRTY_LOG_RING v. memslot
creation.
Example:
- Enable KVM_CAP_DIRTY_LOG_RING
- Create some memslots w/ dirty logging enabled (note that the bitmap
is _not_ allocated)
- Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
- Save ITS tables and get a NULL dereference in
mark_page_dirty_in_slot():
if (vcpu && kvm->dirty_ring_size)
kvm_dirty_ring_push(&vcpu->dirty_ring,
slot, rel_gfn);
else
-------> set_bit_le(rel_gfn, memslot->dirty_bitmap);
Similarly, KVM may unnecessarily allocate bitmaps if dirty logging is
enabled on memslots before KVM_CAP_DIRTY_LOG_RING is enabled.
You could paper over this issue by disallowing DIRTY_RING_WITH_BITMAP if
DIRTY_LOG_RING has already been enabled, but the better approach would
be to explicitly check kvm_memslots_empty() such that the real
dependency is obvious. Peter, hadn't you mentioned something about
checking against memslots in an earlier revision?
The userspace (QEMU) needs to ensure that no dirty bitmap is created
before the capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is enabled.
It's unknown by QEMU that vgic/its is used when KVM_CAP_DIRTY_LOG_RING_ACQ_REL
is enabled.
I'm not worried about what QEMU (or any particular VMM for that matter)
does with the UAPI. The problem is this patch provides a trivial way for
userspace to cause a NULL dereference in the kernel. Imposing ordering
between the cap and memslot creation avoids the problem altogether.
So, looking at your example:
kvm_initialization
enable_KVM_CAP_DIRTY_LOG_RING_ACQ_REL // Where KVM_CAP_DIRTY_LOG_RING is enabled
board_initialization // Where QEMU knows if vgic/its is used
Is it possible that QEMU could hoist enabling RING_WITH_BITMAP here?
Based on your description QEMU has decided to use the vGIC ITS but
hasn't yet added any memslots.
It's possible to add ARM specific helper kvm_arm_enable_dirty_ring_with_bitmap()
in qemu/target/arm.c, to enable RING_WITH_BITMAP if needed. The newly added
function can be called here when vgic/its is used.
add_memory_slots
kvm_post_initialization
enable_KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
:
start_migration
enable_dirty_page_tracking
create_dirty_bitmap // With KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP enabled
Just to make sure we're on the same page, there's two issues:
(1) If DIRTY_LOG_RING is enabled before memslot creation and
RING_WITH_BITMAP is enabled after memslots have been created w/
dirty logging enabled, memslot->dirty_bitmap == NULL and the
kernel will fault when attempting to save the ITS tables.
(2) Not your code, but a similar issue. If DIRTY_LOG_RING[_ACQ_REL] is
enabled after memslots have been created w/ dirty logging enabled,
memslot->dirty_bitmap != NULL and that memory is wasted until the
memslot is freed.
I don't expect you to fix #2, though I've mentioned it because using the
same approach to #1 and #2 would be nice.
Yes, I got your points. Case (2) is still possible to happen with QEMU
excluded. However, QEMU is always enabling DIRTY_LOG_RING[_ACQ_REL] before
any memory slot is created. I agree that we need to ensure there are no
memory slots when DIRTY_LOG_RING[_ACQ_REL] is enabled.
For case (1), we can ensure RING_WTIH_BITMAP is enabled before any memory
slot is added, as below. QEMU needs a new helper (as above) to enable it
on board's level.
Lets fix both with a new helper in PATCH[v8 4/9] like below?
static inline bool kvm_vm_has_memslot_pages(struct kvm *kvm)
{
bool has_memslot_pages;
mutex_lock(&kvm->slots_lock);
has_memslot_pages = !!kvm->nr_memslot_pages;
mutex_unlock(&kvm->slots_lock);
return has_memslot_pages;
}
static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
struct kvm_enable_cap *cap)
{
:
switch (cap->cap) {
case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
!kvm->dirty_ring_size || kvm_vm_has_memslot_pages(kvm))
return -EINVAL;
kvm->dirty_ring_with_bitmap = true;
return 0;
default:
return kvm_vm_ioctl_enable_cap(kvm, cap);
}
}
static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
{
:
/* We only allow it to set once */
if (kvm->dirty_ring_size)
return -EINVAL;
if (kvm_vm_has_memslot_pages(kvm))
return -EINVAL;
:
}
Thanks,
Gavin
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm