On Thu, Dec 10, 2015 at 5:09 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > On Thu, Dec 10, 2015 at 08:30:01AM +1100, NeilBrown wrote: >> On Wed, Dec 09 2015, Peter Zijlstra wrote: >> >> > On Wed, Dec 09, 2015 at 12:06:33PM +1100, NeilBrown wrote: >> >> On Tue, Dec 08 2015, Peter Zijlstra wrote: >> >> >> >> >> >> >> > >> >> > *sigh*, so that patch was broken.. the below might fix it, but please >> >> > someone look at it, I seem to have a less than stellar track record >> >> > here... >> >> >> >> This new change seems to be more intrusive than should be needed. >> >> Can't we just do: >> >> >> >> >> >> __sched int bit_wait(struct wait_bit_key *word) >> >> { >> >> + long state = current->state; >> > >> > No, current->state can already be changed by this time. >> >> Does that matter? >> It can only have changed to TASK_RUNNING - right? >> In that case signal_pending_state() will return 0 and the bit_wait() acts >> as though the thread was woken up normally (which it was) rather than by >> a signal (which maybe it was too, but maybe that happened just a tiny >> bit later). >> >> As long as signal delivery doesn't change ->state, we should be safe. >> We should even be safe testing ->state *after* the call the schedule(). > > Blergh, all I've managed to far is to confuse myself further. Even > something like the original (+- the EINTR) should work when we consider > the looping, even when mixed with an occasional spurious wakeup. > > > int bit_wait() > { > if (signal_pending_state(current->state, current)) > return -EINTR; > schedule(); > } > > > This can go wrong against raising a signal thusly: > > prepare_to_wait() > 1: if (signal_pending_state(current->state, current)) > // false, nothing pending > schedule(); > set_tsk_thread_flag(t, TIF_SIGPENDING); > > <spurious wakeup> > > prepare_to_wait() > wake_up_state(t, ...); > 2: if (signal_pending_state(current->state, current)) > // false, TASK_RUNNING > > schedule(); // doesn't block because pending Note that a quick inspection does not turn up _any_ TASK_INTERRUPTIBLE callers. When this previously occurred, it could likely only be with a fatal signal, which would have hidden these sins. > > prepare_to_wait() > 3: if (signal_pending_state(current->state, current)) > // true, pending > Hugh asked me about this after seeing a crash, here's another exciting way in which the current code breaks -- this one actually quite serious: Consider __lock_page: void __lock_page(struct page *page) { DEFINE_WAIT_BIT(wait, &page->flags, PG_locked); __wait_on_bit_lock(page_waitqueue(page), &wait, bit_wait_io, TASK_UNINTERRUPTIBLE); } With the current state of the world, __sched int bit_wait_io(struct wait_bit_key *word) { - if (signal_pending_state(current->state, current)) - return 1; io_schedule(); + if (signal_pending(current)) + return -EINTR; return 0; } Called from __wait_on_bit_lock. Previously, signal_pending_state() was checked under TASK_UNINTERRUPTIBLE (via prepare_to_wait_exclusive). Now, we simply check for the presence of any signal -- after we have returned to running state, e.g. post io_schedule() when somebody has kicked the wait-queue. However, this now means that _wait_on_bit_lock can return -EINTR up to __lock_page; which does not validate the return code and blindly returns. This looks to have been a previously existing bug, but it was at least masked by the fact that it required a fatal signal previously (and that the page we return unlocked is likely going to be freed from the dying process anyway). Peter's proposed follow-up above looks strictly more correct. We need to evaluate the potential existence of a signal, *after* we return from schedule, but in the context of the state which we previously _entered_ schedule() on. Reviewed-by: Paul Turner <pjt@xxxxxxxxxx> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
![]() |