Re: [PATCH next] sbitmap: fix lockup while swapping

Hugh Dickins <hughd@xxxxxxxxxx> · Fri, 23 Sep 2022 14:29:37 -0700 (PDT)

On Fri, 23 Sep 2022, Keith Busch wrote:

> Does the following fix the observation? Rational being that there's no reason
> to spin on the current wait state that is already under handling; let
> subsequent clearings proceed to the next inevitable wait state immediately.

It's running fine without lockup so far; but doesn't this change merely
narrow the window?  If this is interrupted in between atomic_try_cmpxchg()
setting wait_cnt to 0 and sbq_index_atomic_inc() advancing wake_index,
don't we run the same risk as before, of sbitmap_queue_wake_up() from
the interrupt handler getting stuck on that wait_cnt 0?

> 
> ---
> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
> index 624fa7f118d1..47bf7882210b 100644
> --- a/lib/sbitmap.c
> +++ b/lib/sbitmap.c
> @@ -634,6 +634,13 @@ static bool __sbq_wake_up(struct sbitmap_queue *sbq, int *nr)
>  
>  	*nr -= sub;
>  
> +	/*
> +	 * Increase wake_index before updating wait_cnt, otherwise concurrent
> +	 * callers can see valid wait_cnt in old waitqueue, which can cause
> +	 * invalid wakeup on the old waitqueue.
> +	 */
> +	sbq_index_atomic_inc(&sbq->wake_index);
> +
>  	/*
>  	 * When wait_cnt == 0, we have to be particularly careful as we are
>  	 * responsible to reset wait_cnt regardless whether we've actually
> @@ -660,13 +667,6 @@ static bool __sbq_wake_up(struct sbitmap_queue *sbq, int *nr)
>  	 * of atomic_set().
>  	 */
>  	smp_mb__before_atomic();
> -
> -	/*
> -	 * Increase wake_index before updating wait_cnt, otherwise concurrent
> -	 * callers can see valid wait_cnt in old waitqueue, which can cause
> -	 * invalid wakeup on the old waitqueue.
> -	 */
> -	sbq_index_atomic_inc(&sbq->wake_index);
>  	atomic_set(&ws->wait_cnt, wake_batch);
>  
>  	return ret || *nr;
> --