Re: [PATCH v8 00/11] KVM: x86/mmu: Age sptes locklessly

Yu Zhao <yuzhao@xxxxxxxxxx> · Tue, 5 Nov 2024 12:21:05 -0700

On Tue, Nov 5, 2024 at 11:43 AM James Houghton <jthoughton@xxxxxxxxxx> wrote:
>
> Andrew has queued patches to make MGLRU consult KVM when doing aging[8].
> Now, make aging lockless for the shadow MMU and the TDP MMU. This allows
> us to reduce the time/CPU it takes to do aging and the performance
> impact on the vCPUs while we are aging.
>
> The final patch in this series modifies access_tracking_stress_test to
> age using MGLRU. There is a mode (-p) where it will age while the vCPUs
> are faulting memory in. Here are some results with that mode:

Additional background in case I didn't provide it before:

At Google we keep track of hotness/coldness of VM memory to identify
opportunities to demote cold memory into slower tiers of storage. This
is done in a controlled manner so that while we benefit from the
improved memory efficiency through improved bin-packing, without
violating customer SLOs.

However, the monitoring/tracking introduced two major overheads [1] for us:
1. the traditional (host) PFN + rmap data structures [2] used to
locate host PTEs (containing the accessed bits).
2. the KVM MMU lock required to clear the accessed bits in
secondary/shadow PTEs.

MGLRU provides the infrastructure for us to reach out into page tables
directly from a list of mm_struct's, and therefore allows us to bypass
the first problem above and reduce the CPU overhead by ~80% for our
workloads (90%+ mmaped memory). This series solves the second problem:
by supporting locklessly clearing the accessed bits in SPTEs, it would
reduce our current KVM MMU lock contention by >80% [3]. All other
existing mechanisms, e.g., Idle Page Tracking, DAMON, etc., can also
seamlessly benefit from this series when monitoring/tracking VM
memory.

[1] https://lwn.net/Articles/787611/
[2] https://docs.kernel.org/admin-guide/mm/idle_page_tracking.html
[3] https://research.google/pubs/profiling-a-warehouse-scale-computer/