Re: [PATCH] fs,xfs: fix missed wakeup on l_flush_wait

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 8 May 2019 07:22:13 +1000

On Tue, May 07, 2019 at 01:05:28PM -0400, Rik van Riel wrote:
> The code in xlog_wait uses the spinlock to make adding the task to
> the wait queue, and setting the task state to UNINTERRUPTIBLE atomic
> with respect to the waker.
> 
> Doing the wakeup after releasing the spinlock opens up the following
> race condition:
> 
> - add task to wait queue
> 
> -                                      wake up task
> 
> - set task state to UNINTERRUPTIBLE
> 
> Simply moving the spin_unlock to after the wake_up_all results
> in the waker not being able to see a task on the waitqueue before
> it has set its state to UNINTERRUPTIBLE.

Yup, seems like an issue. Good find, Rik.

So, what problem is this actually fixing? Was it noticed by
inspection, or is it actually manifesting on production machines?
If it is manifesting IRL, what are the symptoms (e.g. hang running
out of log space?) and do you have a test case or any way to
exercise it easily?

And, FWIW, did you check all the other xlog_wait() users for the
same problem?

> The lock ordering of taking the waitqueue lock inside the l_icloglock
> is already used inside xlog_wait; it is unclear why the waker was doing
> things differently.

Historical, most likely, and the wakeup code has changed in years
gone by and a race condition that rarely manifests is unlikely to be
noticed.

....

Yeah, the conversion from counting semaphore outside the iclog lock
to use wait queues in 2008 introduced this bug. The wait queue
addition was moved inside the lock, the wakeup left outside. So:

Fixes: d748c62367eb ("[XFS] Convert l_flushsema to a sv_t")

Apart from the commit message, the change looks fine.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx