On Mon, Oct 10, 2022 at 08:20:29PM -0400, Peter Xu wrote: > On Mon, Oct 10, 2022 at 11:58:22PM +0000, Oliver Upton wrote: > > I think this further drives the point home -- there's zero need for the > > bitmap with dirty ring on x86, so why even support it? The proposal of > > ALLOW_BITMAP && DIRTY_RING should be arm64-specific. Any other arch that > > needs to dirty memory outside of a vCPU context can opt-in to the > > behavior. > > Yeah that sounds working too, but it'll be slightly hackish as then the > user app will need some "#ifdef ARM64" blocks for e.g. sync dirty bitmap. > With the new cap the user app can implement the whole ring with generic > code. Isn't the current route of exposing ALLOW_BITMAP on other arches for no reason headed in exactly that direction? Userspace would need to know if it _really_ needs the dirty bitmap in addition to the dirty ring, which could take the form of architecture ifdeffery. OTOH, if the cap is only exposed when it is absolutely necessary, an arch-generic live migration implementation could enable the cap whenever it is advertized and scan the bitmap accordingly. The VMM must know something about the architecture it is running on, as it calls KVM_DEV_ARM_ITS_SAVE_TABLES after all... > Also more flexible to expose it as generic cap? E.g., one day x86 can > enable this too for whatever reason (even though I don't think so..). I had imagined something like this patch where the arch opts-in to some generic construct if it *requires* the use of both the ring and bitmap (very rough sketch). -- Thanks, Oliver
>From ec2fd1ec35b040ac5446f10b895b72d45e0d2845 Mon Sep 17 00:00:00 2001 From: Oliver Upton <oliver.upton@xxxxxxxxx> Date: Tue, 11 Oct 2022 00:57:00 +0000 Subject: [PATCH] KVM: Add support for using dirty ring in conjuction with bitmap Some architectures (such as arm64) need to dirty memory outside of the context of a vCPU. Of course, this simply doesn't fit with the UAPI of KVM's per-vCPU dirty ring. Introduce a new flavor of dirty ring that requires the use of both vCPU dirty rings and a dirty bitmap. The expectation is that for non-vCPU sources of dirty memory (such as the GIC ITS on arm64), KVM writes to the dirty bitmap. Userspace should scan the dirty bitmap before migrating the VM to the target. Use an additional capability to advertize this behavior and require explicit opt-in to avoid breaking the existing dirty ring ABI. And yes, you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do not allow userspace to enable dirty ring if it hasn't also enabled the ring && bitmap capability, as a VM is likely DOA without the pages marked in the bitmap. Signed-off-by: Oliver Upton <oliver.upton@xxxxxxxxx> --- include/linux/kvm_host.h | 1 + include/uapi/linux/kvm.h | 1 + virt/kvm/Kconfig | 8 ++++++++ virt/kvm/kvm_main.c | 16 ++++++++++++++-- 4 files changed, 24 insertions(+), 2 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index f4519d3689e1..76a6b1687312 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -779,6 +779,7 @@ struct kvm { pid_t userspace_pid; unsigned int max_halt_poll_ns; u32 dirty_ring_size; + bool dirty_ring_with_bitmap; bool vm_bugged; bool vm_dead; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 0d5d4419139a..c87b5882d7ae 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_S390_ZPCI_OP 221 #define KVM_CAP_S390_CPU_TOPOLOGY 222 #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223 +#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 224 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 800f9470e36b..e3b45a5489b6 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -33,6 +33,14 @@ config HAVE_KVM_DIRTY_RING_ACQ_REL bool select HAVE_KVM_DIRTY_RING +# Only architectures that need to dirty memory outside of a vCPU +# context should select this, advertising to userspace the +# requirement to use a dirty bitmap in addition to the vCPU dirty +# ring. +config HAVE_KVM_DIRTY_RING_WITH_BITMAP + bool + depends on HAVE_KVM_DIRTY_RING + config HAVE_KVM_EVENTFD bool select EVENTFD diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 5b064dbadaf4..5a99ab973706 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3304,7 +3304,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm, { struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); -#ifdef CONFIG_HAVE_KVM_DIRTY_RING +#if defined(CONFIG_HAVE_KVM_DIRTY_RING) && !defined(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm)) return; #endif @@ -3313,7 +3313,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm, unsigned long rel_gfn = gfn - memslot->base_gfn; u32 slot = (memslot->as_id << 16) | memslot->id; - if (kvm->dirty_ring_size) + if (vcpu && kvm->dirty_ring_size) kvm_dirty_ring_push(&vcpu->dirty_ring, slot, rel_gfn); else @@ -4488,6 +4488,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) #endif case KVM_CAP_BINARY_STATS_FD: case KVM_CAP_SYSTEM_EVENT_DATA: +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_ENABLE_BITMAP + case KVM_CAP_DIRTY_LOG_RING_ENABLE_BITMAP: +#endif return 1; default: break; @@ -4499,6 +4502,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size) { int r; +#ifdef CONFIG_HAVE_DIRTY_RING_WITH_BITMAP + if (!kvm->dirty_ring_with_bitmap) + return -EINVAL; +#endif + if (!KVM_DIRTY_LOG_PAGE_OFFSET) return -EINVAL; @@ -4588,6 +4596,10 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm, case KVM_CAP_DIRTY_LOG_RING: case KVM_CAP_DIRTY_LOG_RING_ACQ_REL: return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]); + case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: + kvm->dirty_ring_with_bitmap = cap->args[0]; + + return 0; default: return kvm_vm_ioctl_enable_cap(kvm, cap); } -- 2.38.0.rc1.362.ged0d419d3c-goog