This patch series adds ring-based dirty memory tracking support for performant checkpointing solutions. It can also be used by live migration to improve predictability and performance for memory intensive workloads. Introduction Brendan Cully's Remus project white paper is one of the best written on the subject of fault tolerance using checkpoint/rollback techniques and is the best place to start for a general background. (http://www.cs.ubc.ca/~andy/papers/remus-nsdi-final.pdf) It gives a great outline of the basic requirements and characteristics of a checkpointed system, including a few of the performance issues. But Remus did not go far enough in the area of system performance for commercial production. This patch series addresses known bottleneck and limitation in a checkpointed system: use of large bitmaps to track dirty memory. These bitmaps are copied to userspace when userspace queries KVM for its dirty page information. The use of bitmaps makes sense in the live-migration method, as it is possible for all of memory to be dirtied from one log-dirty pass to another. But in a checkpointed system, the number of dirty pages is bounded such that the VM is paused when it has dirtied a pre-defined number of pages. Traversing a large, sparsely populated bitmap to find set bits is time-consuming, as is copying the bitmap to user-space. The preferred data structure for performant checkpointing solutions is a dense list of guest frame numbers (GFN). This patch series stores the dirty list in kernel memory that can be memory mapped into userspace to allow speedy harvesting. The modification and still more modifications to qemu have allowed us to run checkpoint cycles at rates up to 2500 per second, while still allowing the VM to get useful work done. Design Goals The patch series does not change or remove any existing KVM functionality. It represents only additional functions (ioctls) into KVM from userspace and these changes coexist with current dirty memory logging facilities. It is possible to run multiple guests such that some of the guests perform live migration using the existing memory logging mechanism and others migrate or run in fault tolerant mode using the new memory tracking functions. Modifications All modifications affect only the KVM instance where the primary (active) VM is running, and these modifications are not in play on the standby (passive) host, where a VM is created that matches the primary in its configuration, but it does not execute until a migration/failover event occurs. Patch 1: KVM: make KVM_CAP_ENABLE_CAP_VM architecture agnostic Patch 2: KVM: move running VCPU from ARM to common code Patch 3: KVM: plumb userspace ABI for ring-based dirty memory tracking Patch 4: KVM: add kvm/vcpu argument to mark_dirty_page_in_slot Patch 5: KVM: Implement ring-based dirty memory tracking Patch 6: KVM: x86: implement ring-based dirty memory tracking Documentation/virtual/kvm/api.txt | 109 +++++++++- arch/arm/include/asm/kvm_host.h | 2 - arch/arm64/include/asm/kvm_host.h | 2 - arch/powerpc/kvm/powerpc.c | 14 +- arch/s390/kvm/kvm-s390.c | 11 +- arch/x86/include/asm/kvm_host.h | 3 + arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kvm/Makefile | 3 +- arch/x86/kvm/mmu.c | 6 + arch/x86/kvm/vmx.c | 7 + arch/x86/kvm/x86.c | 20 +- include/linux/kvm_gfn_ring.h | 68 +++++++ include/linux/kvm_host.h | 17 ++ include/uapi/linux/kvm.h | 33 ++++ virt/kvm/arm/arm.c | 30 --- virt/kvm/arm/perf.c | 6 +- virt/kvm/arm/vgic/vgic-init.c | 2 +- virt/kvm/arm/vgic/vgic-mmio.c | 2 +- virt/kvm/gfn_ring.c | 135 +++++++++++++ virt/kvm/kvm_main.c | 297 +++++++++++++++++++++++++++- 20 files changed, 680 insertions(+), 88 deletions(-)