Now that dirty bitmaps are accessible from user space, we export the addresses of these to achieve zero-copy dirty log check: KVM_GET_USER_DIRTY_LOG_ADDR We also need an API for triggering dirty bitmap switch to take the full advantage of double buffered bitmaps. KVM_SWITCH_DIRTY_LOG See the documentation in this patch for precise explanations. About performance improvement: the most important feature of switch API is the lightness. In our test, this appeared in the form of improved responses for GUI manipulations. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@xxxxxxxxxxxxx> Signed-off-by: Fernando Luis Vazquez Cao <fernando@xxxxxxxxxxxxx> CC: Avi Kivity <avi@xxxxxxxxxx> CC: Alexander Graf <agraf@xxxxxxx> --- Documentation/kvm/api.txt | 87 +++++++++++++++++++++++++++++++++++++++++++++ arch/ia64/kvm/kvm-ia64.c | 27 +++++++++----- arch/powerpc/kvm/book3s.c | 18 +++++++-- arch/x86/kvm/x86.c | 44 ++++++++++++++++------- include/linux/kvm.h | 11 ++++++ include/linux/kvm_host.h | 4 ++- virt/kvm/kvm_main.c | 63 +++++++++++++++++++++++++++++---- 7 files changed, 220 insertions(+), 34 deletions(-) diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt index a237518..c106c83 100644 --- a/Documentation/kvm/api.txt +++ b/Documentation/kvm/api.txt @@ -892,6 +892,93 @@ arguments. This ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel irqchip, the multiprocessing state must be maintained by userspace. +4.39 KVM_GET_USER_DIRTY_LOG_ADDR + +Capability: KVM_CAP_USER_DIRTY_LOG (>=1 see below) +Architectures: all +Type: vm ioctl +Parameters: struct kvm_user_dirty_log (in/out) +Returns: 0 on success, -1 on error + +This ioctl makes it possible to use KVM_SWITCH_DIRTY_LOG (see 4.40) instead +of the old dirty log manipulation by KVM_GET_DIRTY_LOG. + +A bit about KVM_CAP_USER_DIRTY_LOG + +Before this ioctl was introduced, dirty bitmaps for dirty page logging were +allocated in the kernel's memory space. But we have now moved them to user +space to get more flexiblity and performance. To achive this move without +breaking the compatibility, we will split KVM_CAP_USER_DIRTY_LOG capability +into a few generations which can be identified by its check extension +return values. + +This KVM_GET_USER_DIRTY_LOG_ADDR belongs to the first generation with the +KVM_SWITCH_DIRTY_LOG (4.40) and must be supported by all generations. + +What you get + +By using this, you can get paired bitmap addresses which are accessible from +user space. See the explanation in 4.40 for the roles of these two bitmaps. + +How to Get + +Before calling this, you have to set the slot member of kvm_user_dirty_log +to indicate the target memory slot. + +struct kvm_user_dirty_log { + __u32 slot; + __u32 flags; + __u64 dirty_bitmap; + __u64 dirty_bitmap_old; +}; + +The addresses will be set in the paired members: dirty_bitmap and _old. + +Note + +In generation 1, we support bitmaps which are created in kernel but do not +support bitmaps created by users. This means bitmap creation/destruction +are done same as before when you instruct KVM by KVM_SET_USER_MEMORY_REGION +(see 4.34) to start/stop logging. Please do not try to free the exported +bitmaps by yourself, or KVM will access the freed area and end with fault. + +4.40 KVM_SWITCH_DIRTY_LOG + +Capability: KVM_CAP_USER_DIRTY_LOG (>=1 see 4.39) +Architectures: all +Type: vm ioctl +Parameters: memory slot id +Returns: 0 if switched, 1 if not (slot was clean), -1 on error + +This ioctl allows you to switch the dirty log to the next one: a newer +ioctl for getting dirty page logs than KVM_GET_DIRTY_LOG (see 4.7 for the +explanation about dirty page logging, log format is not changed). + +If you have the capability KVM_CAP_USER_DIRTY_LOG, using this is strongly +recommended than using KVM_GET_DIRTY_LOG because this does not need buffer +copy between kernel and user space. + +How to Use + +Before calling this, you have to remember the paired addresses of dirty +bitmaps which can be obtained by KVM_GET_USER_DIRTY_LOG_ADDR (see 4.39): +dirty_bitmap (being used now by kernel) and dirty_bitmap_old (not being +used now and containing the last log). + +After calling this, the role of these bitmaps will change like this; +If the return value was 0, kernel cleared dirty_bitmap_old and began to use +it for the next logging, so that you can use the cold dirty_bitmap to check +the log since the last switch. If the return value was 1, all pages were not +dirty and bitmap switch was not done. Note that remembering which bitmap is +now active is your responsibility. So you have to update your remembering +when you get the return value 0. + +Note + +Bitmap switch may also occur when you call KVM_GET_DIRTY_LOG. Please use +either one, preferably this one, only to avoid extra confusion. We do not +guarantee on which condition KVM_GET_DIRTY_LOG causes bitmap switch. + 5. The kvm_run structure Application code obtains a pointer to the kvm_run structure by diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 03503e6..b590b80 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -1801,8 +1801,7 @@ void kvm_arch_exit(void) kvm_vmm_info = NULL; } -static int kvm_ia64_sync_dirty_log(struct kvm *kvm, - struct kvm_dirty_log *log) +static int kvm_ia64_sync_dirty_log(struct kvm *kvm, int slot) { struct kvm_memory_slot *memslot; int r, i; @@ -1812,10 +1811,10 @@ static int kvm_ia64_sync_dirty_log(struct kvm *kvm, offsetof(struct kvm_vm_data, kvm_mem_dirty_log)); r = -EINVAL; - if (log->slot >= KVM_MEMORY_SLOTS) + if (slot >= KVM_MEMORY_SLOTS) goto out; - memslot = &kvm->memslots->memslots[log->slot]; + memslot = &kvm->memslots->memslots[slot]; r = -ENOENT; if (!memslot->dirty_bitmap) goto out; @@ -1843,8 +1842,8 @@ out: return r; } -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, - struct kvm_dirty_log *log) +static int kvm_ia64_update_dirty_log(struct kvm *kvm, int slot, + unsigned long __user *log_bitmap) { int r; unsigned long n; @@ -1853,15 +1852,15 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, mutex_lock(&kvm->slots_lock); spin_lock(&kvm->arch.dirty_log_lock); - r = kvm_ia64_sync_dirty_log(kvm, log); + r = kvm_ia64_sync_dirty_log(kvm, slot); if (r) goto out; - r = kvm_get_dirty_log(kvm, log); + r = kvm_update_dirty_log(kvm, slot, log_bitmap); if (r) goto out; - memslot = &kvm->memslots->memslots[log->slot]; + memslot = &kvm->memslots->memslots[slot]; /* If nothing is dirty, don't bother messing with page tables. */ if (memslot->is_dirty) { kvm_flush_remote_tlbs(kvm); @@ -1879,6 +1878,16 @@ out: return r; } +int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) +{ + return kvm_ia64_update_dirty_log(kvm, log->slot, log->dirty_bitmap); +} + +int kvm_vm_ioctl_switch_dirty_log(struct kvm *kvm, int slot) +{ + return kvm_ia64_update_dirty_log(kvm, slot, NULL); +} + int kvm_arch_hardware_setup(void) { return 0; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 2a31d2f..54b3a76 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -1185,8 +1185,8 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, /* * Get (and clear) the dirty memory log for a memory slot. */ -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, - struct kvm_dirty_log *log) +static int kvmppc_update_dirty_log(struct kvm *kvm, int slot, + unsigned long __user *log_bitmap) { struct kvm_memory_slot *memslot; struct kvm_vcpu *vcpu; @@ -1196,11 +1196,11 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, mutex_lock(&kvm->slots_lock); - r = kvm_get_dirty_log(kvm, log); + r = kvm_update_dirty_log(kvm, slot, log_bitmap); if (r) goto out; - memslot = &kvm->memslots->memslots[log->slot]; + memslot = &kvm->memslots->memslots[slot]; /* If nothing is dirty, don't bother messing with page tables. */ if (memslot->is_dirty) { ga = memslot->base_gfn << PAGE_SHIFT; @@ -1223,6 +1223,16 @@ out: return r; } +int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) +{ + return kvmppc_update_dirty_log(kvm, log->slot, log->dirty_bitmap); +} + +int kvm_vm_ioctl_switch_dirty_log(struct kvm *kvm, int slot) +{ + return kvmppc_update_dirty_log(kvm, slot, NULL); +} + int kvmppc_core_check_processor_compat(void) { return 0; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 32a3d94..7a31ab1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2737,8 +2737,8 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm, /* * Get (and clear) the dirty memory log for a memory slot. */ -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, - struct kvm_dirty_log *log) +static int kvm_x86_update_dirty_log(struct kvm *kvm, int slot, + unsigned long __user *log_bitmap) { int r; struct kvm_memory_slot *memslot; @@ -2747,10 +2747,10 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, mutex_lock(&kvm->slots_lock); r = -EINVAL; - if (log->slot >= KVM_MEMORY_SLOTS) + if (slot >= KVM_MEMORY_SLOTS) goto out; - memslot = &kvm->memslots->memslots[log->slot]; + memslot = &kvm->memslots->memslots[slot]; r = -ENOENT; if (!memslot->dirty_bitmap) goto out; @@ -2764,7 +2764,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, unsigned long __user *dirty_bitmap_old; spin_lock(&kvm->mmu_lock); - kvm_mmu_slot_remove_write_access(kvm, log->slot); + kvm_mmu_slot_remove_write_access(kvm, slot); spin_unlock(&kvm->mmu_lock); dirty_bitmap = memslot->dirty_bitmap; @@ -2779,22 +2779,30 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, goto out; memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots)); - slots->memslots[log->slot].dirty_bitmap = dirty_bitmap_old; - slots->memslots[log->slot].dirty_bitmap_old = dirty_bitmap; - slots->memslots[log->slot].is_dirty = false; + slots->memslots[slot].dirty_bitmap = dirty_bitmap_old; + slots->memslots[slot].dirty_bitmap_old = dirty_bitmap; + slots->memslots[slot].is_dirty = false; old_slots = kvm->memslots; rcu_assign_pointer(kvm->memslots, slots); synchronize_srcu_expedited(&kvm->srcu); kfree(old_slots); - r = -EFAULT; - if (copy_in_user(log->dirty_bitmap, dirty_bitmap, n)) - goto out; + if (log_bitmap) { + r = -EFAULT; + if (copy_in_user(log_bitmap, dirty_bitmap, n)) + goto out; + } } else { - r = -EFAULT; - if (clear_user(log->dirty_bitmap, n)) + if (log_bitmap) { + r = -EFAULT; + if (clear_user(log_bitmap, n)) + goto out; + } else { + /* Tell the user about no switch. */ + r = 1; goto out; + } } r = 0; @@ -2803,6 +2811,16 @@ out: return r; } +int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) +{ + return kvm_x86_update_dirty_log(kvm, log->slot, log->dirty_bitmap); +} + +int kvm_vm_ioctl_switch_dirty_log(struct kvm *kvm, int slot) +{ + return kvm_x86_update_dirty_log(kvm, slot, NULL); +} + long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 23ea022..47980c2 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -322,6 +322,14 @@ struct kvm_dirty_log { }; }; +/* for KVM_GET_USER_DIRTY_LOG_ADDR */ +struct kvm_user_dirty_log { + __u32 slot; + __u32 flags; + __u64 dirty_bitmap; + __u64 dirty_bitmap_old; +}; + /* for KVM_SET_SIGNAL_MASK */ struct kvm_signal_mask { __u32 len; @@ -524,6 +532,7 @@ struct kvm_enable_cap { #define KVM_CAP_PPC_OSI 52 #define KVM_CAP_PPC_UNSET_IRQ 53 #define KVM_CAP_ENABLE_CAP 54 +#define KVM_CAP_USER_DIRTY_LOG 55 #ifdef KVM_CAP_IRQ_ROUTING @@ -620,6 +629,8 @@ struct kvm_clock_data { struct kvm_userspace_memory_region) #define KVM_SET_TSS_ADDR _IO(KVMIO, 0x47) #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO, 0x48, __u64) +#define KVM_GET_USER_DIRTY_LOG_ADDR _IOW(KVMIO, 0x49, struct kvm_user_dirty_log) +#define KVM_SWITCH_DIRTY_LOG _IO(KVMIO, 0x4a) /* Device model IOC */ #define KVM_CREATE_IRQCHIP _IO(KVMIO, 0x60) #define KVM_IRQ_LINE _IOW(KVMIO, 0x61, struct kvm_irq_level) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index c95e2b7..a94277a 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -337,9 +337,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp, int kvm_dev_ioctl_check_extension(long ext); -int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log); +int kvm_update_dirty_log(struct kvm *kvm, int slot, + unsigned long __user *log_bitmap); int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log); +int kvm_vm_ioctl_switch_dirty_log(struct kvm *kvm, int slot); int kvm_vm_ioctl_set_memory_region(struct kvm *kvm, struct diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index ddcf65a..300a0c1 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -818,26 +818,55 @@ int kvm_vm_ioctl_set_memory_region(struct kvm *kvm, return kvm_set_memory_region(kvm, mem, user_alloc); } -int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) +static int kvm_vm_ioctl_get_user_dirty_log_addr(struct kvm *kvm, + struct kvm_user_dirty_log *log) +{ + struct kvm_memory_slot *memslot; + + if (log->slot >= KVM_MEMORY_SLOTS) + return -EINVAL; + + memslot = &kvm->memslots->memslots[log->slot]; + if (!memslot->dirty_bitmap) + return -ENOENT; + + log->dirty_bitmap = (unsigned long)memslot->dirty_bitmap; + log->dirty_bitmap_old = (unsigned long)memslot->dirty_bitmap_old; + return 0; +} + +int kvm_update_dirty_log(struct kvm *kvm, int slot, + unsigned long __user *log_bitmap) { struct kvm_memory_slot *memslot; int r; - unsigned long n; r = -EINVAL; - if (log->slot >= KVM_MEMORY_SLOTS) + if (slot >= KVM_MEMORY_SLOTS) goto out; - memslot = &kvm->memslots->memslots[log->slot]; + memslot = &kvm->memslots->memslots[slot]; r = -ENOENT; if (!memslot->dirty_bitmap) goto out; - n = kvm_dirty_bitmap_bytes(memslot); + if (log_bitmap) { + unsigned long n = kvm_dirty_bitmap_bytes(memslot); - r = -EFAULT; - if (copy_in_user(log->dirty_bitmap, memslot->dirty_bitmap, n)) + r = -EFAULT; + if (copy_in_user(log_bitmap, memslot->dirty_bitmap, n)) + goto out; + } else if (memslot->is_dirty) { + unsigned long __user *dirty_bitmap; + + dirty_bitmap = memslot->dirty_bitmap; + memslot->dirty_bitmap = memslot->dirty_bitmap_old; + memslot->dirty_bitmap_old = dirty_bitmap; + } else { + /* Tell the user about no switch. */ + r = 1; goto out; + } r = 0; out: @@ -1647,6 +1676,25 @@ static long kvm_vm_ioctl(struct file *filp, goto out; break; } + case KVM_GET_USER_DIRTY_LOG_ADDR: { + struct kvm_user_dirty_log log; + + r = -EFAULT; + if (copy_from_user(&log, argp, sizeof log)) + goto out; + r = kvm_vm_ioctl_get_user_dirty_log_addr(kvm, &log); + if (r) + goto out; + r = -EFAULT; + if (copy_to_user(argp, &log, sizeof log)) + goto out; + r = 0; + break; + } + case KVM_SWITCH_DIRTY_LOG: { + r = kvm_vm_ioctl_switch_dirty_log(kvm, arg); + break; + } #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET case KVM_REGISTER_COALESCED_MMIO: { struct kvm_coalesced_mmio_zone zone; @@ -1823,6 +1871,7 @@ static long kvm_dev_ioctl_check_extension_generic(long arg) case KVM_CAP_USER_MEMORY: case KVM_CAP_DESTROY_MEMORY_REGION_WORKS: case KVM_CAP_JOIN_MEMORY_REGIONS_WORKS: + case KVM_CAP_USER_DIRTY_LOG: #ifdef CONFIG_KVM_APIC_ARCHITECTURE case KVM_CAP_SET_BOOT_CPU_ID: #endif -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ia64" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html