Hi, This is to revive the RFC series[1], which makes use of hardware dirty bit modifier(DBM) feature(FEAT_HAFDBS) for dirty page tracking, sent out by Zhu Keqian sometime back. One of the main drawbacks in using the hardware DBM feature for dirty page tracking is the additional overhead in scanning the PTEs for dirty pages[2]. Also there are no vCPU page faults when we set the DBM bit, which may result in higher convergence time during guest migration. This series tries to reduce these overheads by not setting the DBM for all the writeable pages during migration and instead uses a combined software(current page fault mechanism) and hardware approach (set DBM) for dirty page tracking. As noted in RFC v1[1], "The core idea is that we do not enable hardware dirty at start (do not add DBM bit). When an arbitrary PT occurs fault, we execute soft tracking for this PT and enable hardware tracking for its *nearby* PTs (e.g. Add DBM bit for nearby 64PTs). Then when sync dirty log, we have known all PTs with hardware dirty enabled, so we do not need to scan all PTs." Major changes from the RFC v1 are: 1. Rebased to 6.5-rc5 + FEAT_TLBIRANGE series[3]. The original RFC v1 was based on 5.11 and there are multiple changes in KVM/arm64 that fundamentally changed the way the page tables are updated. I am not 100% sure that I got all the locking mechanisms right during page table traversal here. But haven't seen any regressions or mem corruptions so far in my test setup. 2. Use of ctx->flags for handling DBM updates(patch#2) 3. During migration, we can only set DBM for pages that are already writeable. But the CLEAR_LOG path will set all the pages as write protected. There isn't any easy way to distinguish previous read-only pages from this write protected pages. Hence, made use of "Reserved for Software use" bits in the page descriptor to mark "writeable-clean" pages. See patch #4. 4. Introduced KVM_CAP_ARM_HW_DBM for enabling this feature from userspace. Testing ---------- Hardware: HiSilicon ARM64 platform(without FEAT_TLBIRANGE) Kernel: 6.5-rc5 based with eager page split explicitly enabled(chunksize=2MB) Tests with dirty_log_perf_test with anonymous THP pages shows significant improvement in "dirty memory time" as expected but with a hit on "get dirty time" . ./dirty_log_perf_test -b 512MB -v 96 -i 5 -m 2 -s anonymous_thp +---------------------------+----------------+------------------+ | | 6.5-rc5 | 6.5-rc5 + series | | | (s) | (s) | +---------------------------+----------------+------------------+ | dirty memory time | 4.22 | 0.41 | | get dirty log time | 0.00047 | 3.25 | | clear dirty log time | 0.48 | 0.98 | +---------------------------------------------------------------+ In order to get some idea on actual live migration performance, I created a VM (96vCPUs, 1GB), ran a redis-benchmark test and while the test was in progress initiated live migration(local). redis-benchmark -t set -c 900 -n 5000000 --threads 96 Average of 5 runs shows that benchmark finishes ~10% faster with a ~8% increase in "total time" for migration. +---------------------------+----------------+------------------+ | | 6.5-rc5 | 6.5-rc5 + series | | | (s) | (s) | +---------------------------+----------------+------------------+ | [redis]5000000 requests in| 79.428 | 71.49 | | [info migrate]total time | 8438 | 9097 | +---------------------------------------------------------------+ Also ran extensive VM migrations with a Qemu with md5 checksum calculated for RAM. No regressions or memory corruption observed so far. It looks like this series will benefit VMs with write intensive workloads to improve the Guest uptime during migration. Please take a look and let me know your feedback. Any help with further tests and verification is really appreciated. Thanks, Shameer 1. https://lore.kernel.org/linux-arm-kernel/20210126124444.27136-1-zhukeqian1@xxxxxxxxxx/ 2. https://lore.kernel.org/linux-arm-kernel/20200525112406.28224-1-zhukeqian1@xxxxxxxxxx/ 3. https://lore.kernel.org/kvm/20230811045127.3308641-1-rananta@xxxxxxxxxx/ Keqian Zhu (5): arm64: cpufeature: Add API to report system support of HWDBM KVM: arm64: Add some HW_DBM related pgtable interfaces KVM: arm64: Add some HW_DBM related mmu interfaces KVM: arm64: Only write protect selected PTE KVM: arm64: Start up SW/HW combined dirty log Shameer Kolothum (3): KVM: arm64: Add KVM_PGTABLE_WALK_HW_DBM for HW DBM support KVM: arm64: Set DBM for writeable-clean pages KVM: arm64: Add KVM_CAP_ARM_HW_DBM arch/arm64/include/asm/cpufeature.h | 15 +++ arch/arm64/include/asm/kvm_host.h | 8 ++ arch/arm64/include/asm/kvm_mmu.h | 7 ++ arch/arm64/include/asm/kvm_pgtable.h | 53 ++++++++++ arch/arm64/kernel/image-vars.h | 2 + arch/arm64/kvm/arm.c | 138 ++++++++++++++++++++++++++ arch/arm64/kvm/hyp/pgtable.c | 139 +++++++++++++++++++++++++-- arch/arm64/kvm/mmu.c | 50 +++++++++- include/uapi/linux/kvm.h | 1 + 9 files changed, 403 insertions(+), 10 deletions(-) -- 2.34.1