On 09/12/19 22:54, Peter Xu wrote: > Just until recently I noticed that actually kvm_get_running_vcpu() has > a real benefit in that it gives a very solid result on whether we're > with the vcpu context, even more accurate than when we pass vcpu > pointers around (because sometimes we just passed the kvm pointer > along the stack even if we're with a vcpu context, just like what we > did with mark_page_dirty_in_slot). Right, that's the point. > I'm thinking whether I can start > to use this information in the next post on solving an issue I > encountered with the waitqueue. > > Current waitqueue is still problematic in that it could wait even with > the mmu lock held when with vcpu context. I think the idea of the soft limit is that the waiting just cannot happen. That is, the number of dirtied pages _outside_ the guest (guest accesses are taken care of by PML, and are subtracted from the soft limit) cannot exceed hard_limit - (soft_limit + pml_size). > The issue is KVM_RESET_DIRTY_RINGS needs the mmu lock to manipulate > the write bits, while it's the only interface to also wake up the > dirty ring sleepers. They could dead lock like this: > > main thread vcpu thread > =========== =========== > kvm page fault > mark_page_dirty_in_slot > mmu lock taken > mark dirty, ring full > queue on waitqueue > (with mmu lock) > KVM_RESET_DIRTY_RINGS > take mmu lock <------------ deadlock here > reset ring gfns > wakeup dirty ring sleepers > > And if we see if the mark_page_dirty_in_slot() is not with a vcpu > context (e.g. kvm_mmu_page_fault) but with an ioctl context (those > cases we'll use per-vm dirty ring) then it's probably fine. > > My planned solution: > > - When kvm_get_running_vcpu() != NULL, we postpone the waitqueue waits > until we finished handling this page fault, probably in somewhere > around vcpu_enter_guest, so that we can do wait_event() after the > mmu lock released I think this can cause a race: vCPU 1 vCPU 2 host --------------------------------------------------------------- mark page dirty write to page treat page as not dirty add page to ring where vCPU 2 skips the clean-page slow path entirely. Paolo