Re: [PATCH v2 0/3] kvm/mm: Allow GUP to respond to non fatal signals

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Any further comments?  Thanks,

On Wed, Jul 20, 2022 at 08:03:15PM -0400, Peter Xu wrote:
> v2:
> - Added r-b
> - Rewrite the comment in faultin_page() for FOLL_INTERRUPTIBLE [John]
> - Dropped the controversial patch to introduce a flag for
>   __gfn_to_pfn_memslot(), instead used a boolean for now [Sean]
> - Rename s/is_sigpending_pfn/KVM_PFN_ERR_SIGPENDING/ [Sean]
> - Change comment in kvm_faultin_pfn() mentioning fatal signals [Sean]
> 
> rfc: https://lore.kernel.org/kvm/20220617014147.7299-1-peterx@xxxxxxxxxx
> v1:  https://lore.kernel.org/kvm/20220622213656.81546-1-peterx@xxxxxxxxxx
> 
> One issue was reported that libvirt won't be able to stop the virtual
> machine using QMP command "stop" during a paused postcopy migration [1].
> 
> It won't work because "stop the VM" operation requires the hypervisor to
> kick all the vcpu threads out using SIG_IPI in QEMU (which is translated to
> a SIGUSR1).  However since during a paused postcopy, the vcpu threads are
> hang death at handle_userfault() so there're simply not responding to the
> kicks.  Further, the "stop" command will further hang the QMP channel.
> 
> The mm has facility to process generic signal (FAULT_FLAG_INTERRUPTIBLE),
> however it's only used in the PF handlers only, not in GUP. Unluckily, KVM
> is a heavy GUP user on guest page faults.  It means we won't be able to
> interrupt a long page fault for KVM fetching guest pages with what we have
> right now.
> 
> I think it's reasonable for GUP to only listen to fatal signals, as most of
> the GUP users are not really ready to handle such case.  But actually KVM
> is not such an user, and KVM actually has rich infrastructure to handle
> even generic signals, and properly deliver the signal to the userspace.
> Then the page fault can be retried in the next KVM_RUN.
> 
> This patchset added FOLL_INTERRUPTIBLE to enable FAULT_FLAG_INTERRUPTIBLE,
> and let KVM be the first one to use it.  KVM and mm/gup can always be able
> to respond to fatal signals, but not non-fatal ones until this patchset.
> 
> One thing to mention is that this is not allowing all KVM paths to be able
> to respond to non fatal signals, but only on x86 slow page faults.  In the
> future when more code is ready for handling signal interruptions, we can
> explore possibility to have more gup callers using FOLL_INTERRUPTIBLE.
> 
> Tests
> =====
> 
> I created a postcopy environment, pause the migration by shutting down the
> network to emulate a network failure (so the handle_userfault() will stuck
> for a long time), then I tried three things:
> 
>   (1) Sending QMP command "stop" to QEMU monitor,
>   (2) Hitting Ctrl-C from QEMU cmdline,
>   (3) GDB attach to the dest QEMU process.
> 
> Before this patchset, all three use case hang.  After the patchset, all
> work just like when there's not network failure at all.
> 
> Please have a look, thanks.
> 
> [1] https://gitlab.com/qemu-project/qemu/-/issues/1052

-- 
Peter Xu





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux