Re: fs: uninterruptible hang in handle_userfault

Andrea Arcangeli <aarcange@xxxxxxxxxx> · Wed, 2 Mar 2016 15:55:21 +0100

Hello,

On Wed, Mar 02, 2016 at 12:48:46AM +0000, Al Viro wrote:
> On Tue, Mar 01, 2016 at 12:06:49PM -0800, Linus Torvalds wrote:
> 
> > So the only access we really care about is the child tid-pointer
> > clearing one, and that always happens after PF_EXITING has been set
> > afaik.
> > 
> > No other case really matters. If somebody accesses a userfault region
> > just as another thread is exiting, we don't care. I don't think it
> > would necessarily be wrong to ignore the fault, but I don't think it's
> > relevant either, since at that stage the normal "you can signal the
> > thread" still works. It's only the child tid access that comes *after*
> > we have stopped acceping signals, and that's marked by that
> > PF_EXITING.
> > 
> > Or maybe I misunderstood your worry entirely or missed something, and
> > my answer above is entirely beside your point. Did you have something
> > else in mind?
> 
> No, I've misread de_thread()/zap_other_threads().  No objections to the
> patch.

I reviewed the fix (a) too and it's sure fine with me too.

So I evaluated if we could fix the deadlock by simply preventing to
run userland page faults while SIGKILL cannot be delivered anymore. I
think skipping those userland accesses wouldn't be strictly required
if they were run by a legit app with a userfaultfd manager thread
alive and well.

Running page faults that late in the exit path with signal disabled
was frankly unexpected. So I did a quick attempt to test such an
exit-code reordering change, but it's overall more risky and so far I
didn't succeed to have a SIGKILL reach handle_userfault() despite I
cleaned up that futex code into a proper futex_exit run just before
exit_signals instead of inside mm_release. Apparently it's not just
PF_EXITING that prevents SIGKILL to reach handle_userfault(). The
below change still didn't allow to kill the task:

+	exit_futex(tsk); /* run before setting PF_EXITING */
	exit_signals(tsk);  /* sets PF_EXITING */

So again I think for now fix (a) is preferable and I verified it too
with the test program. As far as the primary production user of
userfaulfd is concerned, no futex will run in the userfaultfd tracked
region so no matter what the futex exit code is there for, there's no
risk of memory corruption.

Thanks,
Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html