On 11/7/22 22:21, Sean Christopherson wrote:
Hmm, and the memslot heuristic doesn't address the recovery worker holding mmu_lock for write. On a non-preemptible kernel, rwlock_needbreak() is always false, e.g. the worker won't yield to vCPUs that are trying to handle non-fast page faults. The worker should eventually reach steady state by unaccounting everything, but that might take a while.
I'm not sure what you mean here? The recovery worker will still decrease to_zap by 1 on every unaccounted NX hugepage, and go to sleep after it reaches 0.
Also, David's test used a 10-second halving time for the recovery thread. With the 1 hour time the effect would Perhaps the 1 hour time used by default by KVM is overly conservative, but 1% over 10 seconds is certainly a lot larger an effect, than 1% over 1 hour.
So, I'm queuing the patch. Paolo
An alternative idea to the memslot heuristic would be to add a knob to allow disabling the recovery thread on a per-VM basis. Userspace should know that it's dirty logging a given VM for migration.