Re: [PATCH] KVM: use set_page_dirty rather than SetPageDirty

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 27, 2022, Chris Mason wrote:
> 
> 
> > On Jan 26, 2022, at 6:11 PM, Boris Burkov <boris@xxxxxx> wrote:
> > 
> > On Wed, Jan 26, 2022 at 09:59:02PM +0000, Sean Christopherson wrote:
> >> On Wed, Jan 26, 2022, Boris Burkov wrote:
> >>> I tested this fix on the workload and it did prevent the hangs. However,
> >>> I am unsure if the fix is appropriate from a locking perspective, so I
> >>> hope to draw some extra attention to that aspect. set_page_dirty_lock in
> >>> mm/page-writeback.c has a comment about locking that says set_page_dirty
> >>> should be called with the page locked or while definitely holding a
> >>> reference to the mapping's host inode. I believe that the mmap should
> >>> have that reference, so for fear of hurting KVM performance or
> >>> introducing a deadlock, I opted for the unlocked variant.
> >> 
> >> KVM doesn't hold a reference per se, but it does subscribe to mmu_notifier events
> >> and will not mark the page dirty after KVM has been instructed to unmap the page
> >> (barring bugs, which we've had a slew of).  So yeah, the unlocked variant should
> >> be safe.
> >> 
> >> Is it feasible to trigger this behavior in a selftest?  KVM has had, and probably
> >> still has, many bugs that all boil down to KVM assuming guest memory is backed by
> >> either anonymous memory or something like shmem/HugeTLBFS/memfd that isn't typically
> >> truncated by the host.
> > 
> > I haven't been able to isolate a reproducer, yet. I am a bit stumped
> > because there isn't a lot for me to go off from that stack I shared--the
> > best I have so far is that I need to trick KVM into emulating
> > instructions at some point to get to this 'complete_userspace_io'
> > codepath? I will keep trying, since I think it would be valuable to know
> > what exactly happened. Open to try any suggestions you might have as
> > well.
> 
> From the btrfs side, bare calls to set_page_dirty() are suboptimal, since it
> doesn’t go through the ->page_mkwrite() dance that we use to properly COW
> things.  It’s still much better than SetPageDirty(), but I’d love to
> understand why kvm needs to dirty the page so we can figure out how to go
> through the normal mmap file io paths.

Ah, is the issue that writeback gets stuck because KVM perpetually marks the
page as dirty?  The page in question should have already gone through ->page_mkwrite().
Outside of one or two internal mmaps that KVM fully controls and are anonymous memory,
KVM doesn't modify VMAs.  KVM is calling SetPageDirty() to mark that it has written
to the page; KVM either when it unmaps the page from the guest, or in this case, when
it kunmap()'s a page KVM itself accessed.

Based on the call stack, my best guest is that KVM is udpating steal_time info.
That's triggered when the vCPU is (re)loaded, which would explain the correlation
to complete_userspace_io() as KVM unloads=>reloads the vCPU before/after exiting
to userspace to handle emulate I/O.

Oh!  I assume that the page is either unmapped or made read-only before writeback?
v5.6 (and many kernels since) had a bug where KVM would "miss" mmu_notifier events
for the steal_time cache.  It's basically a use-after-free issue at that point.  Commit
7e2175ebd695 ("KVM: x86: Fix recording of guest steal time / preempted status").



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux