On Fri 23-09-22 16:15:29, Hugh Dickins wrote: > On Fri, 23 Sep 2022, Hugh Dickins wrote: > > On Fri, 23 Sep 2022, Keith Busch wrote: > > > > > Does the following fix the observation? Rational being that there's no reason > > > to spin on the current wait state that is already under handling; let > > > subsequent clearings proceed to the next inevitable wait state immediately. > > > > It's running fine without lockup so far; but doesn't this change merely > > narrow the window? If this is interrupted in between atomic_try_cmpxchg() > > setting wait_cnt to 0 and sbq_index_atomic_inc() advancing wake_index, > > don't we run the same risk as before, of sbitmap_queue_wake_up() from > > the interrupt handler getting stuck on that wait_cnt 0? > > Yes, it ran successfully for 50 minutes, then an interrupt came in > immediately after the cmpxchg, and it locked up just as before. > > Easily dealt with by disabling interrupts, no doubt, but I assume it's a > badge of honour not to disable interrupts here (except perhaps in waking). I don't think any magic with sbq_index_atomic_inc() is going to reliably fix this. After all the current waitqueue may be the only one that has active waiters so sbq_wake_ptr() will always end up returning this waitqueue regardless of the current value of sbq->wake_index. Honestly, this whole code needs a serious redesign. I have some simplifications in mind but it will take some thinking and benchmarking so we need some fix for the interim. I was pondering for quite some time about some band aid to the problem you've found but didn't find anything satisfactory. In the end I see two options: 1) Take your patch (as wrong as it is ;). Yes, it can lead to lost wakeups but we were living with those for a relatively long time so probably we can live with them for some longer. 2) Revert Yu Kuai's original fix 040b83fcecfb8 ("sbitmap: fix possible io hung due to lost wakeup") and my fixup 48c033314f37 ("sbitmap: Avoid leaving waitqueue in invalid state in __sbq_wake_up()"). But then Keith would have to redo his batched accounting patches on top. > Some clever way to make the wait_cnt and wake_index adjustments atomic? > > Or is this sbitmap_queue_wake_up() interrupting sbitmap_queue_wake_up() > just supposed never to happen, the counts preventing it: but some > misaccounting letting it happen by mistake? No, I think that is in principle a situation that we have to accommodate. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR