On Wed, 2018-07-18 at 11:00 -0600, Keith Busch wrote: +AD4- void blk+AF8-sync+AF8-queue(struct request+AF8-queue +ACo-q) +AD4- +AHs- +AD4- - del+AF8-timer+AF8-sync(+ACY-q-+AD4-timeout)+ADs- +AD4- - cancel+AF8-work+AF8-sync(+ACY-q-+AD4-timeout+AF8-work)+ADs- +AD4- - +AD4- if (q-+AD4-mq+AF8-ops) +AHs- +AD4- struct blk+AF8-mq+AF8-hw+AF8-ctx +ACo-hctx+ADs- +AD4- int i+ADs- +AD4- +AEAAQA- -415,6 +-412,8 +AEAAQA- void blk+AF8-sync+AF8-queue(struct request+AF8-queue +ACo-q) +AD4- queue+AF8-for+AF8-each+AF8-hw+AF8-ctx(q, hctx, i) +AD4- cancel+AF8-delayed+AF8-work+AF8-sync(+ACY-hctx-+AD4-run+AF8-work)+ADs- +AD4- +AH0- else +AHs- +AD4- +- del+AF8-timer+AF8-sync(+ACY-q-+AD4-timeout)+ADs- +AD4- +- cancel+AF8-work+AF8-sync(+ACY-q-+AD4-timeout+AF8-work)+ADs- +AD4- cancel+AF8-delayed+AF8-work+AF8-sync(+ACY-q-+AD4-delay+AF8-work)+ADs- +AD4- +AH0- +AD4- +AH0- What is the impact of this change on the md driver, which is the only driver that calls blk+AF8-sync+AF8-queue() directly? What will happen if timeout processing happens concurrently with or after blk+AF8-sync+AF8-queue() has returned? +AD4- static void blk+AF8-mq+AF8-timeout+AF8-work(struct work+AF8-struct +ACo-work) +AD4- +AHs- +AD4- - struct request+AF8-queue +ACo-q +AD0- +AD4- - container+AF8-of(work, struct request+AF8-queue, timeout+AF8-work)+ADs- +AD4- +- struct blk+AF8-mq+AF8-tag+AF8-set +ACo-set +AD0- +AD4- +- container+AF8-of(work, struct blk+AF8-mq+AF8-tag+AF8-set, timeout+AF8-work)+ADs- +AD4- +- struct request+AF8-queue +ACo-q+ADs- +AD4- unsigned long next +AD0- 0+ADs- +AD4- struct blk+AF8-mq+AF8-hw+AF8-ctx +ACo-hctx+ADs- +AD4- int i+ADs- +AD4- +AD4- - /+ACo- A deadlock might occur if a request is stuck requiring a +AD4- - +ACo- timeout at the same time a queue freeze is waiting +AD4- - +ACo- completion, since the timeout code would not be able to +AD4- - +ACo- acquire the queue reference here. +AD4- - +ACo- +AD4- - +ACo- That's why we don't use blk+AF8-queue+AF8-enter here+ADs- instead, we use +AD4- - +ACo- percpu+AF8-ref+AF8-tryget directly, because we need to be able to +AD4- - +ACo- obtain a reference even in the short window between the queue +AD4- - +ACo- starting to freeze, by dropping the first reference in +AD4- - +ACo- blk+AF8-freeze+AF8-queue+AF8-start, and the moment the last request is +AD4- - +ACo- consumed, marked by the instant q+AF8-usage+AF8-counter reaches +AD4- - +ACo- zero. +AD4- - +ACo-/ +AD4- - if (+ACE-percpu+AF8-ref+AF8-tryget(+ACY-q-+AD4-q+AF8-usage+AF8-counter)) +AD4- - return+ADs- +AD4- - +AD4- - blk+AF8-mq+AF8-queue+AF8-tag+AF8-busy+AF8-iter(q, blk+AF8-mq+AF8-check+AF8-expired, +ACY-next)+ADs- +AD4- +- blk+AF8-mq+AF8-tagset+AF8-busy+AF8-iter(set, blk+AF8-mq+AF8-check+AF8-expired, +ACY-next)+ADs- +AD4- +AD4- if (next +ACEAPQ- 0) +AHs- +AD4- - mod+AF8-timer(+ACY-q-+AD4-timeout, next)+ADs- +AD4- - +AH0- else +AHs- +AD4- +- mod+AF8-timer(+ACY-set-+AD4-timer, next)+ADs- +AD4- +- return+ADs- +AD4- +- +AH0- +AD4- +- +AD4- +- list+AF8-for+AF8-each+AF8-entry(q, +ACY-set-+AD4-tag+AF8-list, tag+AF8-set+AF8-list) +AHs- +AD4- /+ACo- +AD4- +ACo- Request timeouts are handled as a forward rolling timer. If +AD4- +ACo- we end up here it means that no requests are pending and +AD4- +AEAAQA- -881,7 +-868,6 +AEAAQA- static void blk+AF8-mq+AF8-timeout+AF8-work(struct work+AF8-struct +ACo-work) +AD4- blk+AF8-mq+AF8-tag+AF8-idle(hctx)+ADs- +AD4- +AH0- +AD4- +AH0- +AD4- - blk+AF8-queue+AF8-exit(q)+ADs- +AD4- +AH0- What prevents that a request queue is removed from set-+AD4-tag+AF8-list while the above loop examines tag+AF8-list? Can blk+AF8-cleanup+AF8-queue() queue be called from the context of another thread while this loop is examining hardware queues? +AD4- +- timer+AF8-setup(+ACY-set-+AD4-timer, blk+AF8-mq+AF8-timed+AF8-out+AF8-timer, 0)+ADs- +AD4- +- INIT+AF8-WORK(+ACY-set-+AD4-timeout+AF8-work, blk+AF8-mq+AF8-timeout+AF8-work)+ADs- +AD4- +AFs- ... +AF0- +AD4- --- a/include/linux/blk-mq.h +AD4- +-+-+- b/include/linux/blk-mq.h +AD4- +AEAAQA- -86,6 +-86,8 +AEAAQA- struct blk+AF8-mq+AF8-tag+AF8-set +AHs- +AD4- +AD4- struct blk+AF8-mq+AF8-tags +ACoAKg-tags+ADs- +AD4- +AD4- +- struct timer+AF8-list timer+ADs- +AD4- +- struct work+AF8-struct timeout+AF8-work+ADs- Can the timer and timeout+AF8-work data structures be replaced by a single delayed+AF8-work instance? Thanks, Bart.