On Sat, Aug 03, 2024 at 09:35:56AM +0100, David Woodhouse wrote: > On Fri, 2024-08-02 at 18:40 -0400, Peter Xu wrote: > > On Fri, Aug 02, 2024 at 01:03:16PM +0100, David Woodhouse wrote: > > > An alternative workaround (which perhaps we should *also* consider) > > > looked like this (plus some suitable code comment, of course): > > > > > > --- a/arch/x86/mm/fault.c > > > +++ b/arch/x86/mm/fault.c > > > @@ -1304,6 +1304,8 @@ void do_user_addr_fault(struct pt_regs *regs, > > > */ > > > if (user_mode(regs)) > > > flags |= FAULT_FLAG_USER; > > > + else > > > + flags &= ~FAULT_FLAG_INTERRUPTIBLE; > > > > ... > > Instead of "interruptible exception" or the original patch (which might > > still be worthwhile, though? I didn't follow much on kvm and the new gpc > > cache, but looks still nicer than get/put user from initial glance), above > > Yes, I definitely think we want the GPC conversion anyway. That's why I > suggested it to Carsten, to resolve our *immediate* problem while we > continue to ponder the general case. > > > looks like the easier and complete solution to me. For "completeness", I > > mean I am not sure how many other copy_to/from_user() code in kvm can hit > > this, so looks like still possible to hit outside steal time page? > > Right. It theoretically applies to *any* user access. It's just that > anything other than *guest* pages is slightly less likely to be backed > by userfaultfd. > > > I thought only the slow fault path was involved in INTERRUPTIBLE thing and > > that was the plan, but I guess I overlooked how the default value could > > affect copy to/from user invoked from KVM as well.. > > > > With above patch to drop FAULT_FLAG_INTERRUPTIBLE for !user, KVM can still > > opt-in INTERRUPTIBLE anywhere by leveraging hva_to_pfn[_slow]() API, which > > is "INTERRUPTIBLE"-ready with a boolean the caller can set. But the caller > > will need to be able to process KVM_PFN_ERR_SIGPENDING. > > Right. I think converting kvm_{read,write}_guest() and friends to do > that and be interruptible might make sense? Makes sense to me. It's just that there seem to be a lot of the contexts that using this, so I'm not sure how much work needed to integrate the new KVM_PFN_ERR_SIGPENDING, and whether it'll be worthwhile. Also, not sure whether some context that can be too involved to only handle sigkill/quit. And this can also, logically, trigger with kvm_{read,write}_guest() or similar path already, right? I wonder why it can so far only trigger with steal time; I probably missed something. > > The patch snippet above obviously only fixes it for x86 and would need > to be done across the board. Unless we do this one instead, abusing the > knowledge that uffd is the only thing which honours > FAULT_FLAG_INTERRUPTIBLE? > > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -351,7 +351,7 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx, > > static inline unsigned int userfaultfd_get_blocking_state(unsigned int flags) > { > - if (flags & FAULT_FLAG_INTERRUPTIBLE) > + if ((flags & FAULT_FLAG_INTERRUPTIBLE) && (flags & FAULT_FLAG_USER)) > return TASK_INTERRUPTIBLE; > > if (flags & FAULT_FLAG_KILLABLE) This smells hacky to me, as FAULT_FLAG_INTERRUPTIBLE is a fault API that the fault handler says "let's respond to non-fatal signals". It means here userfault is violating the ABI.. And, IIUC this concept of "handling non-fatal signal" can apply outside userfaultfd too. The one in my mind is __folio_lock_or_retry(). The previous change looks more reasonable, as I think it's a bug that in do_user_addr_fault() (just take x86 as example) it specifies the INTERRUPTIBLE but later after handle_mm_fault() it ignored it in fault_signal_pending() for !user. So it makes sense to me to have FAULT_FLAG_DEFAULT matching what fault_signal_pending() does. From that POV perhaps if FAULT_FLAG_DEFAULT can take "user" as input would be even cleaner (instead of clearing it later). > > I still quite like the idea of *optional* interruptible exceptions, as > seen in my proof of concept. Perhaps we wouldn't want the read(2) and > write(2) system calls to use them, but there are plenty of other system > calls which could be interruptible instead of blocking. I don't have enough much direct experience there, but it sounds reasonable to me. > > Right now, even the simple case of a trivial SIGINT handler which does > some minor cleanup before exiting, makes it a non-fatal signal so the > kernel blocks and waits for ever. > Thanks, -- Peter Xu