Re: [RFC PATCH 00/15] KVM: x86/mmu: Eager Page Splitting for the TDP MMU

Peter Xu <peterx@xxxxxxxxxx> · Fri, 26 Nov 2021 22:13:44 +0800

Hi, David,

On Fri, Nov 19, 2021 at 11:57:44PM +0000, David Matlack wrote:
> This series is a first pass at implementing Eager Page Splitting for the
> TDP MMU. For context on the motivation and design of Eager Page
> Splitting, please see the RFC design proposal and discussion [1].
> 
> Paolo, I went ahead and added splitting in both the intially-all-set
> case (only splitting the region passed to CLEAR_DIRTY_LOG) and the
> case where we are not using initially-all-set (splitting the entire
> memslot when dirty logging is enabled) to give you an idea of what
> both look like.
> 
> Note: I will be on vacation all of next week so I will not be able to
> respond to reviews until Monday November 29. I thought it would be
> useful to seed discussion and reviews with an early version of the code
> rather than putting it off another week. But feel free to also ignore
> this until I get back :)
> 
> This series compiles and passes the most basic splitting test:
> 
> $ ./dirty_log_perf_test -s anonymous_hugetlb_2mb -v 2 -i 4
> 
> But please operate under the assumption that this code is probably
> buggy.
> 
> [1] https://lore.kernel.org/kvm/CALzav=dV_U4r1K9oDq4esb4mpBQDQ2ROQ5zH5wV3KpOaZrRW-A@xxxxxxxxxxxxxx/#t

Will there be more numbers to show in the formal patchset?  It's interesting to
know how "First Pass Dirty Memory Time" will change comparing to the rfc
numbers; I can have a feel of it, but still. :) Also, not only how it speedup
guest dirty apps, but also some general measurement on how it slows down
KVM_SET_USER_MEMORY_REGION (!init-all-set) or CLEAR_LOG (init-all-set) would be
even nicer (for CLEAR, I guess the 1st/2nd+ round will have different overhead).

Besides that, I'm also wondering whether we should still have a knob for it, as
I'm wondering what if the use case is the kind where eager split huge page may
not help at all.  What I'm thinking:

  - Read-mostly guest overload; split huge page will speed up rare writes, but
    at the meantime drag readers down due to huge->small page mappings.

  - Writes-over-very-limited-region workload: say we have 1T guest and the app
    in the guest only writes 10G part of it.  Hmm not sure whether it exists..

  - Postcopy targeted: it means precopy may only run a few iterations just to
    send the static pages, so the migration duration will be relatively short,
    and the write just didn't spread a lot to the whole guest mem.

I don't really think any of the example is strong enough as they're all very
corner cased, but just to show what I meant to raise this question on whether
unconditionally eager split is the best approach.

Thanks,

-- 
Peter Xu