We saw a hang in production with WBT where there was only one waiter in the throttle path and no outstanding IO. This is because of the has_sleepers optimization that is used to make sure we don't steal an inflight counter for new submitters when there are people already on the list. We can race with our check to see if the waitqueue has any waiters (this is done locklessly) and the time we actually add ourselves to the waitqueue. If this happens we'll go to sleep and never be woken up because nobody is doing IO to wake us up. Fix this by open coding prepare_to_wait_exclusive (yes, yes, I know) in order to get a real value for has_sleepers. This way we keep our optimization in place and avoid hanging forever if there are no longer any waiters on the list. Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx> --- block/blk-rq-qos.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c index 659ccb8b693f..04590666f7c4 100644 --- a/block/blk-rq-qos.c +++ b/block/blk-rq-qos.c @@ -237,13 +237,18 @@ void rq_qos_wait(struct rq_wait *rqw, void *private_data, .cb = acquire_inflight_cb, .private_data = private_data, }; + unsigned long flags; bool has_sleeper; has_sleeper = wq_has_sleeper(&rqw->wait); if (!has_sleeper && acquire_inflight_cb(rqw, private_data)) return; - prepare_to_wait_exclusive(&rqw->wait, &data.wq, TASK_UNINTERRUPTIBLE); + spin_lock_irqsave(&rqw->wait.lock, flags); + has_sleeper = !list_empty(&rqw->wait.head); + __add_wait_queue_entry_tail_exclusive(&rqw->wait, &data.wq); + set_current_state(TASK_UNINTERRUPTIBLE); + spin_unlock_irqrestore(&rqw->wait.lock, flags); do { if (data.got_token) break; -- 2.17.1