Re: [PATCH 00/23] Extend Eager Page Splitting to the shadow MMU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, David,

Sorry for a very late comment.

On Thu, Feb 03, 2022 at 01:00:28AM +0000, David Matlack wrote:
> Performance
> -----------
> 
> Eager page splitting moves the cost of splitting huge pages off of the
> vCPU thread and onto the thread invoking VM-ioctls to configure dirty
> logging. This is useful because:
> 
>  - Splitting on the vCPU thread interrupts vCPUs execution and is
>    disruptive to customers whereas splitting on VM ioctl threads can
>    run in parallel with vCPU execution.
> 
>  - Splitting on the VM ioctl thread is more efficient because it does
>    no require performing VM-exit handling and page table walks for every
>    4K page.
> 
> To measure the performance impact of Eager Page Splitting I ran
> dirty_log_perf_test with tdp_mmu=N, various virtual CPU counts, 1GiB per
> vCPU, and backed by 1GiB HugeTLB memory.
> 
> To measure the imapct of customer performance, we can look at the time
> it takes all vCPUs to dirty memory after dirty logging has been enabled.
> Without Eager Page Splitting enabled, such dirtying must take faults to
> split huge pages and bottleneck on the MMU lock.
> 
>              | "Iteration 1 dirty memory time"             |
>              | ------------------------------------------- |
> vCPU Count   | eager_page_split=N   | eager_page_split=Y   |
> ------------ | -------------------- | -------------------- |
> 2            | 0.310786549s         | 0.058731929s         |
> 4            | 0.419165587s         | 0.059615316s         |
> 8            | 1.061233860s         | 0.060945457s         |
> 16           | 2.852955595s         | 0.067069980s         |
> 32           | 7.032750509s         | 0.078623606s         |
> 64           | 16.501287504s        | 0.083914116s         |
> 
> Eager Page Splitting does increase the time it takes to enable dirty
> logging when not using initially-all-set, since that's when KVM splits
> huge pages. However, this runs in parallel with vCPU execution and does
> not bottleneck on the MMU lock.
> 
>              | "Enabling dirty logging time"               |
>              | ------------------------------------------- |
> vCPU Count   | eager_page_split=N   | eager_page_split=Y   |
> ------------ | -------------------- | -------------------- |
> 2            | 0.001581619s         |  0.025699730s        |
> 4            | 0.003138664s         |  0.051510208s        |
> 8            | 0.006247177s         |  0.102960379s        |
> 16           | 0.012603892s         |  0.206949435s        |
> 32           | 0.026428036s         |  0.435855597s        |
> 64           | 0.103826796s         |  1.199686530s        |
> 
> Similarly, Eager Page Splitting increases the time it takes to clear the
> dirty log for when using initially-all-set. The first time userspace
> clears the dirty log, KVM will split huge pages:
> 
>              | "Iteration 1 clear dirty log time"          |
>              | ------------------------------------------- |
> vCPU Count   | eager_page_split=N   | eager_page_split=Y   |
> ------------ | -------------------- | -------------------- |
> 2            | 0.001544730s         | 0.055327916s         |
> 4            | 0.003145920s         | 0.111887354s         |
> 8            | 0.006306964s         | 0.223920530s         |
> 16           | 0.012681628s         | 0.447849488s         |
> 32           | 0.026827560s         | 0.943874520s         |
> 64           | 0.090461490s         | 2.664388025s         |
> 
> Subsequent calls to clear the dirty log incur almost no additional cost
> since KVM can very quickly determine there are no more huge pages to
> split via the RMAP. This is unlike the TDP MMU which must re-traverse
> the entire page table to check for huge pages.
> 
>              | "Iteration 2 clear dirty log time"          |
>              | ------------------------------------------- |
> vCPU Count   | eager_page_split=N   | eager_page_split=Y   |
> ------------ | -------------------- | -------------------- |
> 2            | 0.015613726s         | 0.015771982s         |
> 4            | 0.031456620s         | 0.031911594s         |
> 8            | 0.063341572s         | 0.063837403s         |
> 16           | 0.128409332s         | 0.127484064s         |
> 32           | 0.255635696s         | 0.268837996s         |
> 64           | 0.695572818s         | 0.700420727s         |

Are all the tests above with ept=Y (except the one below)?

> 
> Eager Page Splitting also improves the performance for shadow paging
> configurations, as measured with ept=N. Although the absolute gains are
> less since ept=N requires taking the MMU lock to track writes to 4KiB
> pages (i.e. no fast_page_fault() or PML), which dominates the dirty
> memory time.
> 
>              | "Iteration 1 dirty memory time"             |
>              | ------------------------------------------- |
> vCPU Count   | eager_page_split=N   | eager_page_split=Y   |
> ------------ | -------------------- | -------------------- |
> 2            | 0.373022770s         | 0.348926043s         |
> 4            | 0.563697483s         | 0.453022037s         |
> 8            | 1.588492808s         | 1.524962010s         |
> 16           | 3.988934732s         | 3.369129917s         |
> 32           | 9.470333115s         | 8.292953856s         |
> 64           | 20.086419186s        | 18.531840021s        |

This one is definitely for ept=N because it's written there. That's ~10%
performance increase which looks still good, but IMHO that increase is
"debatable" since a normal guest may not simply write over the whole guest
mem.. So that 10% increase is based on some assumptions.

What if the guest writes 80% and reads 20%?  IIUC the split thread will
also start to block the readers too for shadow mmu while it was not blocked
previusly?  From that pov, not sure whether the series needs some more
justification, as the changeset seems still large.

Is there other benefits besides the 10% increase on writes?

Thanks,

-- 
Peter Xu




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux