On Thu, Nov 17, 2022, Paolo Bonzini wrote: > On 11/17/22 17:39, Sean Christopherson wrote: > > Right, what I'm saying is that this approach is still sub-optimal because it does > > all that work will holding mmu_lock for write. > > > > > Also, David's test used a 10-second halving time for the recovery thread. > > > With the 1 hour time the effect would Perhaps the 1 hour time used by > > > default by KVM is overly conservative, but 1% over 10 seconds is certainly a > > > lot larger an effect, than 1% over 1 hour. > > > > It's not the CPU usage I'm thinking of, it's the unnecessary blockage of MMU > > operations on other tasks/vCPUs. Given that this is related to dirty logging, > > odds are very good that there will be a variety of operations in flight, e.g. > > KVM_GET_DIRTY_LOG. If the recovery ratio is aggressive, and/or there are a lot > > of pages to recover, the recovery thread could hold mmu_lock until a reched is > > needed. > > If you need that, you need to configure your kernel to be preemptible, at > least voluntarily. That's in general a good idea for KVM, given its > rwlock-happiness. IMO, it's not that simple. We always "need" better live migration performance, but we don't need/want preemption in general. > And the patch is not making it worse, is it? Yes, you have to look up the > memslot, but the work to do that should be less than what you save by not > zapping the page. Yes, my objection is that we're adding a heuristic to guess at userspace's intentions (it's probably a good guess, but still) and the resulting behavior isn't optimal. Giving userspace an explicit knob seems straightforward and would address both of those issues, why not go that route? > Perhaps we could add to struct kvm a counter of the number of log-pages > memslots. While a correct value would only be readable with slots_lock > taken, the NX recovery thread is content with an approximate value and > therefore can retrieve it with READ_ONCE or atomic_read(). > > Paolo >