Re: [PATCH next] sbitmap: fix lockup while swapping

Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> · Mon, 26 Sep 2022 22:08:52 +0800

Hi,

在 2022/09/26 19:44, Jan Kara 写道:
On Fri 23-09-22 16:15:29, Hugh Dickins wrote:
On Fri, 23 Sep 2022, Hugh Dickins wrote:
On Fri, 23 Sep 2022, Keith Busch wrote:

Does the following fix the observation? Rational being that there's no reason
to spin on the current wait state that is already under handling; let
subsequent clearings proceed to the next inevitable wait state immediately.

It's running fine without lockup so far; but doesn't this change merely
narrow the window?  If this is interrupted in between atomic_try_cmpxchg()
setting wait_cnt to 0 and sbq_index_atomic_inc() advancing wake_index,
don't we run the same risk as before, of sbitmap_queue_wake_up() from
the interrupt handler getting stuck on that wait_cnt 0?

Yes, it ran successfully for 50 minutes, then an interrupt came in
immediately after the cmpxchg, and it locked up just as before.

Easily dealt with by disabling interrupts, no doubt, but I assume it's a
badge of honour not to disable interrupts here (except perhaps in waking).

I don't think any magic with sbq_index_atomic_inc() is going to reliably
fix this. After all the current waitqueue may be the only one that has active
waiters so sbq_wake_ptr() will always end up returning this waitqueue
regardless of the current value of sbq->wake_index.

Honestly, this whole code needs a serious redesign. I have some
simplifications in mind but it will take some thinking and benchmarking so
we need some fix for the interim. I was pondering for quite some time about
some band aid to the problem you've found but didn't find anything
satisfactory.

In the end I see two options:

1) Take your patch (as wrong as it is ;). Yes, it can lead to lost wakeups
but we were living with those for a relatively long time so probably we can
live with them for some longer.

2) Revert Yu Kuai's original fix 040b83fcecfb8 ("sbitmap: fix possible io
hung due to lost wakeup") and my fixup 48c033314f37 ("sbitmap: Avoid leaving
waitqueue in invalid state in __sbq_wake_up()"). But then Keith would have
to redo his batched accounting patches on top.

Some clever way to make the wait_cnt and wake_index adjustments atomic?

I'm thinking about a hacky way to make the update of wake_cnt and
wake_index atomic, however, redesign of sbitmap_queue is probably
better. 🤣

There are only 8 wait queues and wake_batch is 8 at most, thus
only need 3 * 9 = 27 bit, and a single atomic value is enough:

- 0-2 represents ws[0].wake_cnt
- 3-5 represents ws[1].wake_cnt
- ...
- 21-24 represents ws[7].wake_cnt
- 25-27 represents sbq->wake_index

for example, assume the atomic value is:

0B 111 111 111 111 111 111 111 111 111 000,

which means wake_index is 7 and ws[0].wake_cnt is 0,
if we try to inc wake_index and reset wake_cnt together:

atomic_add(..., 0B 001 000 000 000 000 000 000 000 000 111)

Thanks,
Kuai

Or is this sbitmap_queue_wake_up() interrupting sbitmap_queue_wake_up()
just supposed never to happen, the counts preventing it: but some
misaccounting letting it happen by mistake?

No, I think that is in principle a situation that we have to accommodate.

								Honza