On Fri, 2 Apr 2021 at 08:59, Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > Avoid taking mmu_lock for unrelated .invalidate_range_{start,end}() > notifications. Because mmu_notifier_count must be modified while holding > mmu_lock for write, and must always be paired across start->end to stay > balanced, lock elision must happen in both or none. To meet that > requirement, add a rwsem to prevent memslot updates across range_start() > and range_end(). > > Use a rwsem instead of a rwlock since most notifiers _allow_ blocking, > and the lock will be endl across the entire start() ... end() sequence. > If anything in the sequence sleeps, including the caller or a different > notifier, holding the spinlock would be disastrous. > > For notifiers that _disallow_ blocking, e.g. OOM reaping, simply go down > the slow path of unconditionally acquiring mmu_lock. The sane > alternative would be to try to acquire the lock and force the notifier > to retry on failure. But since OOM is currently the _only_ scenario > where blocking is disallowed attempting to optimize a guest that has been > marked for death is pointless. > > Unconditionally define and use mmu_notifier_slots_lock in the memslots > code, purely to avoid more #ifdefs. The overhead of acquiring the lock > is negligible when the lock is uncontested, which will always be the case > when the MMU notifiers are not used. > > Note, technically flag-only memslot updates could be allowed in parallel, > but stalling a memslot update for a relatively short amount of time is > not a scalability issue, and this is all more than complex enough. > > Based heavily on code from Ben Gardon. > > Suggested-by: Ben Gardon <bgardon@xxxxxxxxxx> > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> I saw this splatting: ====================================================== WARNING: possible circular locking dependency detected 5.12.0-rc3+ #6 Tainted: G OE ------------------------------------------------------ qemu-system-x86/3069 is trying to acquire lock: ffffffff9c775ca0 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at: __mmu_notifier_invalidate_range_end+0x5/0x190 but task is already holding lock: ffffaff7410a9160 (&kvm->mmu_notifier_slots_lock){.+.+}-{3:3}, at: kvm_mmu_notifier_invalidate_range_start+0x36d/0x4f0 [kvm] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&kvm->mmu_notifier_slots_lock){.+.+}-{3:3}: down_read+0x48/0x250 kvm_mmu_notifier_invalidate_range_start+0x36d/0x4f0 [kvm] __mmu_notifier_invalidate_range_start+0xe8/0x260 wp_page_copy+0x82b/0xa30 do_wp_page+0xde/0x420 __handle_mm_fault+0x935/0x1230 handle_mm_fault+0x179/0x420 do_user_addr_fault+0x1b3/0x690 exc_page_fault+0x82/0x2b0 asm_exc_page_fault+0x1e/0x30 -> #0 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}: __lock_acquire+0x110f/0x1980 lock_acquire+0x1bc/0x400 __mmu_notifier_invalidate_range_end+0x47/0x190 wp_page_copy+0x796/0xa30 do_wp_page+0xde/0x420 __handle_mm_fault+0x935/0x1230 handle_mm_fault+0x179/0x420 do_user_addr_fault+0x1b3/0x690 exc_page_fault+0x82/0x2b0 asm_exc_page_fault+0x1e/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&kvm->mmu_notifier_slots_lock); lock(mmu_notifier_invalidate_range_start); lock(&kvm->mmu_notifier_slots_lock); lock(mmu_notifier_invalidate_range_start); *** DEADLOCK *** 2 locks held by qemu-system-x86/3069: #0: ffff9e4269f8a9e0 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x10e/0x690 #1: ffffaff7410a9160 (&kvm->mmu_notifier_slots_lock){.+.+}-{3:3}, at: kvm_mmu_notifier_invalidate_range_start+0x36d/0x4f0 [kvm] stack backtrace: CPU: 0 PID: 3069 Comm: qemu-system-x86 Tainted: G OE 5.12.0-rc3+ #6 Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016 Call Trace: dump_stack+0x87/0xb7 print_circular_bug.isra.39+0x1b4/0x210 check_noncircular+0x103/0x150 __lock_acquire+0x110f/0x1980 ? __lock_acquire+0x110f/0x1980 lock_acquire+0x1bc/0x400 ? __mmu_notifier_invalidate_range_end+0x5/0x190 ? find_held_lock+0x40/0xb0 __mmu_notifier_invalidate_range_end+0x47/0x190 ? __mmu_notifier_invalidate_range_end+0x5/0x190 wp_page_copy+0x796/0xa30 do_wp_page+0xde/0x420 __handle_mm_fault+0x935/0x1230 handle_mm_fault+0x179/0x420 do_user_addr_fault+0x1b3/0x690 ? rcu_read_lock_sched_held+0x4f/0x80 exc_page_fault+0x82/0x2b0 ? asm_exc_page_fault+0x8/0x30 asm_exc_page_fault+0x1e/0x30 RIP: 0033:0x55f5bef2560f