On Fri, 23 Sep 2022, Keith Busch wrote: > Does the following fix the observation? Rational being that there's no reason > to spin on the current wait state that is already under handling; let > subsequent clearings proceed to the next inevitable wait state immediately. It's running fine without lockup so far; but doesn't this change merely narrow the window? If this is interrupted in between atomic_try_cmpxchg() setting wait_cnt to 0 and sbq_index_atomic_inc() advancing wake_index, don't we run the same risk as before, of sbitmap_queue_wake_up() from the interrupt handler getting stuck on that wait_cnt 0? > > --- > diff --git a/lib/sbitmap.c b/lib/sbitmap.c > index 624fa7f118d1..47bf7882210b 100644 > --- a/lib/sbitmap.c > +++ b/lib/sbitmap.c > @@ -634,6 +634,13 @@ static bool __sbq_wake_up(struct sbitmap_queue *sbq, int *nr) > > *nr -= sub; > > + /* > + * Increase wake_index before updating wait_cnt, otherwise concurrent > + * callers can see valid wait_cnt in old waitqueue, which can cause > + * invalid wakeup on the old waitqueue. > + */ > + sbq_index_atomic_inc(&sbq->wake_index); > + > /* > * When wait_cnt == 0, we have to be particularly careful as we are > * responsible to reset wait_cnt regardless whether we've actually > @@ -660,13 +667,6 @@ static bool __sbq_wake_up(struct sbitmap_queue *sbq, int *nr) > * of atomic_set(). > */ > smp_mb__before_atomic(); > - > - /* > - * Increase wake_index before updating wait_cnt, otherwise concurrent > - * callers can see valid wait_cnt in old waitqueue, which can cause > - * invalid wakeup on the old waitqueue. > - */ > - sbq_index_atomic_inc(&sbq->wake_index); > atomic_set(&ws->wait_cnt, wake_batch); > > return ret || *nr; > --