Re: [RFC PATCH 0/7] kvm: arm64: Support stage2 hardware DBM

Marc Zyngier <maz@xxxxxxxxxx> · Mon, 25 May 2020 16:44:53 +0100

On 2020-05-25 12:23, Keqian Zhu wrote:
This patch series add support for stage2 hardware DBM, and it is only
used for dirty log for now.

It works well under some migration test cases, including VM with 4K
pages or 2M THP. I checked the SHA256 hash digest of all memory and
they keep same for source VM and destination VM, which means no dirty
pages is missed under hardware DBM.

However, there are some known issues not solved.

1. Some mechanisms that rely on "write permission fault" become 
invalid,
   such as kvm_set_pfn_dirty and "mmap page sharing".

   kvm_set_pfn_dirty is called in user_mem_abort when guest issues 
write
   fault. This guarantees physical page will not be dropped directly 
when
   host kernel recycle memory. After using hardware dirty management, 
we
   have no chance to call kvm_set_pfn_dirty.

Then you will end-up with memory corruption under memory pressure.
This also breaks things like CoW, which we depend on.

   For "mmap page sharing" mechanism, host kernel will allocate a new
   physical page when guest writes a page that is shared with other 
page
   table entries. After using hardware dirty management, we have no 
chance
   to do this too.

   I need to do some survey on how stage1 hardware DBM solve these 
problems.
   It helps if anyone can figure it out.

2. Page Table Modification Races: Though I have found and solved some 
data
   races when kernel changes page table entries, I still doubt that 
there
   are data races I am not aware of. It's great if anyone can figure 
them out.

3. Performance: Under Kunpeng 920 platform, for every 64GB memory, KVM
   consumes about 40ms to traverse all PTEs to collect dirty log. It 
will
   cause unbearable downtime for migration if memory size is too big. I 
will
   try to solve this problem in Patch v1.

This, in my opinion, is why Stage-2 DBM is fairly useless.
From a performance perspective, this is the worse possible
situation. You end up continuously scanning page tables, at
an arbitrary rate, without a way to evaluate the fault rate.

One thing S2-DBM would be useful for is SVA, where a device
write would mark the S2 PTs dirty as they are shared between
CPU and SMMU. Another thing is SPE, which is essentially a DMA
agent using the CPU's PTs.

But on its own, and just to log the dirty pages, S2-DBM is
pretty rubbish. I wish arm64 had something like Intel's PML,
which looks far more interesting for the purpose of tracking
accesses.

Thanks,

        M.
--
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm