Clean up a KVM module refcounting mess that Al pointed out in the context of the guest_memfd series. The worst behavior was recently introduced by an ill-fated attempt to fix a bug in x86's async #PF code. Instead of fixing the underlying bug of not flushing a workqueue (see patch 2), KVM fudged around the bug by gifting every VM a reference to the KVM module. That made the reproducer happy (hopefully there was actually a reproducer at one point), but it didn't fully fix the use-after-free bug, it just made the bug harder to hit. E.g. as pointed out by Al, if kvm_destroy_vm() is preempted after putting the last KVM module reference, KVM can be unloaded before kvm_destroy_vm() completes, and scheduling back in the associated task will explode (preemption isn't strictly required, it's just the most obvious path to failure). Then after applying that "fix", we/I made an even bigger goof by relying on the nonexistent "protection" provided by the VM's reference and removed the code which guaranteed that the KVM module would be pinned until *after* the last reference to a KVM-owned file was put. Undo the mess we created and fix the original async #PF workqueue bug. Sean Christopherson (3): KVM: Set file_operations.owner appropriately for all such structures KVM: Always flush async #PF workqueue when vCPU is being destroyed Revert "KVM: Prevent module exit until all VMs are freed" arch/x86/kvm/debugfs.c | 1 + virt/kvm/async_pf.c | 15 ++++++++++++--- virt/kvm/kvm_main.c | 18 ++++++++---------- 3 files changed, 21 insertions(+), 13 deletions(-) base-commit: 437bba5ad2bba00c2056c896753a32edf80860cc -- 2.42.0.655.g421f12c284-goog