Re: [PATCH 4.9 4.19] xfs: fix missed wakeup on l_flush_wait

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 30, 2020 at 02:35:07PM -0700, Samuel Mendoza-Jonas wrote:
> From: Rik van Riel <riel@xxxxxxxxxxx>
> 
> commit cdea5459ce263fbc963657a7736762ae897a8ae6 upstream
> 
> The code in xlog_wait uses the spinlock to make adding the task to
> the wait queue, and setting the task state to UNINTERRUPTIBLE atomic
> with respect to the waker.
> 
> Doing the wakeup after releasing the spinlock opens up the following
> race condition:
> 
> Task 1					task 2
> add task to wait queue
> 					wake up task
> set task state to UNINTERRUPTIBLE
> 
> This issue was found through code inspection as a result of kworkers
> being observed stuck in UNINTERRUPTIBLE state with an empty
> wait queue. It is rare and largely unreproducable.
> 
> Simply moving the spin_unlock to after the wake_up_all results
> in the waker not being able to see a task on the waitqueue before
> it has set its state to UNINTERRUPTIBLE.
> 
> This bug dates back to the conversion of this code to generic
> waitqueue infrastructure from a counting semaphore back in 2008
> which didn't place the wakeups consistently w.r.t. to the relevant
> spin locks.
> 
> [dchinner: Also fix a similar issue in the shutdown path on
> xc_commit_wait. Update commit log with more details of the issue.]
> 
> Fixes: d748c62367eb ("[XFS] Convert l_flushsema to a sv_t")
> Reported-by: Chris Mason <clm@xxxxxx>
> Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx>
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx # 4.9.x-4.19.x
> [modified for contextual change near xlog_state_do_callback()]
> Signed-off-by: Samuel Mendoza-Jonas <samjonas@xxxxxxxxxx>
> Reviewed-by: Frank van der Linden <fllinden@xxxxxxxxxx>
> Reviewed-by: Suraj Jitindar Singh <surajjs@xxxxxxxxxx>
> Reviewed-by: Benjamin Herrenschmidt <benh@xxxxxxxxxx>
> Reviewed-by: Anchal Agarwal <anchalag@xxxxxxxxxx>
> ---
> This issue was fixed in v5.4 but didn't appear to make it to stable. The
> fixed commit goes back to v2.6, so backport this to stable kernels
> before v5.4. The only difference is a contextual change at
> 	xlog_state_do_callback(log, XFS_LI_ABORTED, NULL);
> Which in v5.4 is
> 	xlog_state_do_callback(log, true, NULL);

Now queued up, thanks.

greg k-h



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux