Hi Sean,
On 10/22/22 7:20 AM, Sean Christopherson wrote:
On Fri, Oct 21, 2022, Gavin Shan wrote:
What about inverting the naming to better capture that this is about the dirty
bitmap, and less so about the dirty ring? It's not obvious what "exclusive"
means, e.g. I saw this stub before reading the changelog and assumed it was
making a dirty ring exclusive to something.
Something like this?
bool kvm_use_dirty_bitmap(struct kvm *kvm)
{
return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
}
If you agree, I would rename is to kvm_dirty_ring_use_bitmap(). In this way,
we will have "kvm_dirty_ring" prefix for the function name, consistent with
other functions from same module.
I'd prefer to avoid "ring" in the name at all, because in the common case (well,
legacy case at least) the dirty ring has nothing to do with using the dirty
bitmap, e.g. this code ends up being very confusing because the "dirty_ring"
part implies that KVM _doesn't_ need to allocate the bitmap when the dirty ring
isn't being used.
if (!(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
new->dirty_bitmap = NULL;
else if (old && old->dirty_bitmap)
new->dirty_bitmap = old->dirty_bitmap;
else if (kvm_dirty_ring_use_bitmap(kvm) {
r = kvm_alloc_dirty_bitmap(new);
if (r)
return r;
if (kvm_dirty_log_manual_protect_and_init_set(kvm))
bitmap_set(new->dirty_bitmap, 0, new->npages);
}
The helper exists because the dirty ring exists, but the helper is fundamentally
about the dirty bitmap, not the ring.
Thanks for the details. Yeah, it makes sense to avoid "ring" then. Lets use
the name kvm_use_dirty_bitmap() for the function.
But dirty_ring_with_bitmap really shouldn't need to exist. It's mandatory for
architectures that have HAVE_KVM_DIRTY_RING_WITH_BITMAP, and unsupported for
architectures that don't. In other words, the API for enabling the dirty ring
is a bit ugly.
Rather than add KVM_CAP_DIRTY_LOG_RING_ACQ_REL, which hasn't been officially
released yet, and then KVM_CAP_DIRTY_LOG_ING_WITH_BITMAP on top, what about
usurping bits 63:32 of cap->args[0] for flags? E.g.
Ideally we'd use cap->flags directly, but we screwed up with KVM_CAP_DIRTY_LOG_RING
and didn't require flags to be zero :-(
Actually, what's the point of allowing KVM_CAP_DIRTY_LOG_RING_ACQ_REL to be
enabled? I get why KVM would enumerate this info, i.e. allowing checking, but I
don't seen any value in supporting a second method for enabling the dirty ring.
The acquire-release thing is irrelevant for x86, and no other architecture
supports the dirty ring until this series, i.e. there's no need for KVM to detect
that userspace has been updated to gain acquire-release semantics, because the
fact that userspace is enabling the dirty ring on arm64 means userspace has been
updated.
Same goes for the "with bitmap" capability. There are no existing arm64 users,
so there's no risk of breaking existing userspace by suddenly shoving stuff into
the dirty bitmap.
KVM doesn't even get the enabling checks right, e.g. KVM_CAP_DIRTY_LOG_RING can be
enabled on architectures that select CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL but not
KVM_CAP_DIRTY_LOG_RING. The reverse is true (ignoring that x86 selects both and
is the only arch that selects the TSO variant).
Ditto for KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP...
If I didn't miss anything in the previous discussions, we don't want to make
KVM_CAP_DIRTY_LOG_RING_ACQ_REL and KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
architecture dependent. If they become architecture dependent, the userspace
will have different stubs (x86, arm64, other architectures to support
dirty-ring in future) to enable those capabilities. It's not friendly to
userspace. So I intend to prefer the existing pattern: advertise, enable. To
enable a capability without knowing if it's supported sounds a bit weird to
me.
Enabling without KVM advertising that it's supported would indeed be odd. Ugh,
and QEMU doesn't have existing checks to restrict the dirty ring to x86, i.e. we
can't make the ACQ_REL capability a true attribute without breaking userspace.
Rats.
Currently, QEMU doesn't use ACQ_REL and WITH_BITMAP. After both capability are
supported by kvm, we need go ahead to change QEMU so that these two capabilities
can be enabled in QEMU.
I think it's a good idea to enable KVM_CAP_DIRTY_LOG_RING_{ACQ_REL, WITH_BITMAP} as
flags, instead of standalone capabilities. In this way, those two capabilities can
be treated as sub-capability of KVM_CAP_DIRTY_LOG_RING. The question is how these
two flags can be exposed by kvm_vm_ioctl_check_extension_generic(), if we really
want to expose those two flags.
I don't understand your question on how KVM has wrong checks when KVM_CAP_DIRTY_LOG_RING
and KVM_CAP_DIRTY_LOG_RING_ACQ_REL are enabled.
In the current code base, KVM only checks that _a_ form of dirty ring is supported,
by way of kvm_vm_ioctl_enable_dirty_log_ring()'s check on KVM_DIRTY_LOG_PAGE_OFFSET.
The callers don't verify that the "correct" capability is enabled.
case KVM_CAP_DIRTY_LOG_RING:
case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
E.g. userspace could do
if (kvm_check(KVM_CAP_DIRTY_LOG_RING_ACQ_REL))
kvm_enable(KVM_CAP_DIRTY_LOG_RING)
and KVM would happily enable the dirty ring. Functionally it doesn't cause
problems, it's just weird.
Heh, we can fix without more ifdeffery by using the check internally.
Hmm, nice catch! Lets fix it up in a separate patch.
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e30f1b4ecfa5..300489a0eba5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4585,6 +4585,8 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
}
case KVM_CAP_DIRTY_LOG_RING:
case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
+ if (!kvm_vm_ioctl_check_extension_generic(kvm, cap->cap))
+ return -EINVAL;
return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
default:
return kvm_vm_ioctl_enable_cap(kvm, cap);
Thanks,
Gavin