Quoting Matthew Wilcox (2018-04-02 15:10:58) > > Souptick and I have been auditing the various page fault handler routines > and we've noticed that graphics drivers assume that a signal should be > able to interrupt a page fault. In contrast, the page cache takes great > care to allow only fatal signals to interrupt a page fault. > > I believe (but have not verified) that a non-fatal signal being delivered > to a task which is in the middle of a page fault may well end up in an > infinite loop, attempting to handle the page fault and failing forever. > > Here's one of the simpler ones: > > ret = mutex_lock_interruptible(&etnaviv_obj->lock); > if (ret) > return VM_FAULT_NOPAGE; > > (many other drivers do essentially the same thing including i915) > > On seeing NOPAGE, the fault handler believes the PTE is in the page > table, so does nothing before it returns to arch code at which point > I get lost in the magic assembler macros. I believe it will end up > returning to userspace if the signal is non-fatal, at which point it'll > go right back into the page fault handler, and mutex_lock_interruptible() > will immediately fail. So we've converted a sleeping lock into the most > expensive spinlock. I'll ask the obvious question: why isn't the signal handled on return to userspace? > I don't think the graphics drivers really want to be interrupted by > any signal. Assume the worst case and we may block for 10s. Even a 10ms delay may be unacceptable to some signal handlers (one presumes). For the number one ^C usecase, yes that may be reduced to only bother if it's killable, but I wonder if there are not timing loops (e.g. sigitimer in Xorg < 1.19) that want to be able to interrupt random blockages. -Chris