Hi Catalin and Marc, On 2020/7/2 21:55, Keqian Zhu wrote: > This patch series add support for dirty log based on HW DBM. > > It works well under some migration test cases, including VM with 4K > pages or 2M THP. I checked the SHA256 hash digest of all memory and > they keep same for source VM and destination VM, which means no dirty > pages is missed under hardware DBM. > > Some key points: > > 1. Only support hardware updates of dirty status for PTEs. PMDs and PUDs > are not involved for now. > > 2. About *performance*: In RFC patch, I have mentioned that for every 64GB > memory, KVM consumes about 40ms to scan all PTEs to collect dirty log. > This patch solves this problem through two ways: HW/SW dynamic switch > and Multi-core offload. > > HW/SW dynamic switch: Give userspace right to enable/disable hw dirty > log. This adds a new KVM cap named KVM_CAP_ARM_HW_DIRTY_LOG. We can > achieve this by change the kvm->arch.vtcr value and kick vCPUs out to > reload this value to VCTR_EL2. Then userspace can enable hw dirty log > at the begining and disable it when dirty pages is little and about to > stop VM, so VM downtime is not affected. > > Multi-core offload: Offload the PT scanning workload to multi-core can > greatly reduce scanning time. To promise we can complete in time, I use > smp_call_fuction to realize this policy, which utilize IPI to dispatch > workload to other CPUs. Under 128U Kunpeng 920 platform, it just takes > about 5ms to scan PTs of 256 RAM (use mempress and almost all PTs have > been established). And We dispatch workload iterately (every CPU just > scan PTs of 512M RAM for each iteration), so it won't affect physical > CPUs seriously. What do you think of these two methods to solve high-cost PTs scaning? Maybe you are waiting for PML like feature on ARM :-) , but for my test, DBM is usable after these two methods applied. Thanks, Keqian > > 3. About correctness: Only add DBM bit when PTE is already writable, so > we still have readonly PTE and some mechanisms which rely on readonly > PTs are not broken. > > 4. About PTs modification races: There are two kinds of PTs modification. > > The first is adding or clearing specific bit, such as AF or RW. All > these operations have been converted to be atomic, avoid covering > dirty status set by hardware. > > The second is replacement, such as PTEs unmapping or changement. All > these operations will invoke kvm_set_pte finally. kvm_set_pte have > been converted to be atomic and we save the dirty status to underlying > bitmap if dirty status is coverred. > > Change log: > > v2: > - Address Steven's comments. > - Add support of parallel dirty log sync. > - Simplify and merge patches of v1. > > v1: > - Address Catalin's comments. > > Keqian Zhu (8): > KVM: arm64: Set DBM bit for writable PTEs > KVM: arm64: Scan PTEs to sync dirty log > KVM: arm64: Modify stage2 young mechanism to support hw DBM > KVM: arm64: Save stage2 PTE dirty status if it is covered > KVM: arm64: Steply write protect page table by mask bit > KVM: arm64: Add KVM_CAP_ARM_HW_DIRTY_LOG capability > KVM: arm64: Sync dirty log parallel > KVM: Omit dirty log sync in log clear if initially all set > > arch/arm64/include/asm/kvm_host.h | 5 + > arch/arm64/include/asm/kvm_mmu.h | 43 ++++- > arch/arm64/kvm/arm.c | 45 ++++- > arch/arm64/kvm/mmu.c | 307 ++++++++++++++++++++++++++++-- > arch/arm64/kvm/reset.c | 5 + > include/uapi/linux/kvm.h | 1 + > tools/include/uapi/linux/kvm.h | 1 + > virt/kvm/kvm_main.c | 3 +- > 8 files changed, 389 insertions(+), 21 deletions(-) > _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm