At Facebook, we have hit a bug in an interaction between KVM and btrfs running android emulators in a build/test environment using the android emulator's ram snapshot features (-snapshot and -snapstorage). The important aspect of those features is that they result in qemu mmap-ing a file rather than anonymous memory for the guest's memory. Ultimately, we observe (with drgn) pages of the mapped file stuck in btrfs writeback because the mapping's xarray lacks the expected dirty tags. I have not yet been able to pin down the exact vm behavior that results in these bad kvm_set_pfn_dirty calls, but I caught them by instrumenting SetPageDirty with a warning, getting a stack trace like: RIP: 0010:kvm_set_pfn_dirty+0xaf/0xd0 [kvm] <snip> Call Trace: kvm_release_pfn+0x2d/0x40 [kvm] __kvm_map_gfn+0x115/0x2b0 [kvm] kvm_arch_vcpu_ioctl_run+0x1538/0x1b30 [kvm] ? call_function_interrupt+0xa/0x20 kvm_vcpu_ioctl+0x232/0x5e0 [kvm] ksys_ioctl+0x83/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x42/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 kvm_arch_vcpu_ioctl_run+0x1538 is the call to complete_userspace_io on line 8728, for what it's worth. I also confirmed that the page being dirtied in this codepath is the one we end up stuck on. This is on a kernel based off of 5.6, but as far as I can tell, the behavior in KVM is still incorrect currently, as it doesn't account for file backed pages. I tested this fix on the workload and it did prevent the hangs. However, I am unsure if the fix is appropriate from a locking perspective, so I hope to draw some extra attention to that aspect. set_page_dirty_lock in mm/page-writeback.c has a comment about locking that says set_page_dirty should be called with the page locked or while definitely holding a reference to the mapping's host inode. I believe that the mmap should have that reference, so for fear of hurting KVM performance or introducing a deadlock, I opted for the unlocked variant. Signed-off-by: Boris Burkov <boris@xxxxxx> --- virt/kvm/kvm_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 2755ba4177d6..432c109664c3 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2811,7 +2811,7 @@ EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty); void kvm_set_pfn_dirty(kvm_pfn_t pfn) { if (!kvm_is_reserved_pfn(pfn) && !kvm_is_zone_device_pfn(pfn)) - SetPageDirty(pfn_to_page(pfn)); + set_page_dirty(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty); -- 2.34.0