Re: [PATCH 00/10] x86: PTE D Bit based dirty logging

Junaid Shahid <junaids@xxxxxxxxxx> · Tue, 30 Oct 2018 14:28:15 -0700

On 10/23/2018 04:15 PM, Paolo Bonzini wrote:
> 
> FWIW I have played with another optimization of dirty logging, namely
> decoupling the retrieval and clearing of dirty bits.  There are many
> benefits: you avoid having to send the same page multiple times if it's
> modified by the guest between the retrieval of the dirty bitmap and the
> actual sending of the page; you therefore have less write protection
> faults, decreasing their overhead; and finally KVM_GET_DIRTY_LOG now
> need not take the mmu_lock at all.  The change is very simple, basically
> a new enable-able capability that makes KVM_GET_DIRTY_LOG *not* clear
> the dirty bitmap, and a new ioctl KVM_CLEAR_DIRTY_LOG that takes slot,
> first_page, num_pages and a bitmaps of bits to clear.  The bitmap will
> usually be whatever you get from KVM_GET_DIRTY_LOG, but it can also be a
> superset including bits from DMA destinations.  Because it takes a
> first_page and num_pages, the time spent with the mmu_lock taken can be
> made smaller.
> 
> However, if I remember correctly, within GCE you are already working
> around this by defining guests with multiple memory slots.  So perhaps
> it wouldn't help the above extreme 50%+ case.

We do use multiple memory slots, but these days most of the memory is actually put in one slot. (That used to be different earlier.) So the optimization that you mentioned could still be useful. (I am assuming that you could call KVM_CLEAR_DIRTY_LOG multiple times with different subsets of the bitmap.) In addition to the other benefits that you mentioned, this scheme also means that we could skip the KVM_CLEAR_DIRTY_LOG IOCTL altogether during blackout, which should help decrease the blackout time. Though I doubt that it would particularly help the brownout degradation in the extreme case that I had mentioned.

> 
> Also, out of curiosity, since you have probably done benchmarks more
> recently than me, how bad is the impact of enabling write protection on
> large guests?  I recall the initial impact at the beginning of live
> migration was pretty bad, and possibly even seeing soft lockups due to
> excessive time spent holding the mmu_lock.  Perhaps we could reduce it
> by setting the dirty bitmap to all ones when KVM_MEM_LOG_DIRTY_PAGES is
> added to a memslot.  Together with the above optimization, this would
> spread the cost of removing write access over the bulk RAM transfer.
> This change in semantics could be tied to the same capability that
> introduces manual dirty-log reprotection.

I haven't done any recent benchmarks myself, but I think that some others here have. (Peter may have more information about that.) I do recall that we saw similar soft lockups on very large VMs when switching the dirty logging mode from D-Bit to write protection. We had worked around it by inserting cond_resched()s in the loop. 

For the initial enabling, setting the dirty bitmap to all 1s initially and then resetting it piecemeal as the RAM transfer happens could be helpful too. Though I suppose that for the pure write-protection based dirty logging mode, Xiao Guangrong's fast write protect scheme might be better.

Thanks,
Junaid