On 3/23/21 8:31 AM, Sagi Grimberg wrote:
Actually, I had been playing around with marking the entire bio as
'NOWAIT'; that would avoid the tag stall, too:
@@ -313,7 +316,7 @@ blk_qc_t nvme_ns_head_submit_bio(struct bio *bio)
ns = nvme_find_path(head);
if (likely(ns)) {
bio_set_dev(bio, ns->disk->part0);
- bio->bi_opf |= REQ_NVME_MPATH;
+ bio->bi_opf |= REQ_NVME_MPATH | REQ_NOWAIT;
trace_block_bio_remap(bio, disk_devt(ns->head->disk),
bio->bi_iter.bi_sector);
ret = submit_bio_noacct(bio);
My only worry here is that we might incur spurious failures under high
load; but then this is not necessarily a bad thing.
What? making spurious failures is not ok under any load. what fs will
take into account that you may have run out of tags?
Well, it's not actually a spurious failure but rather a spurious
failover, as we're still on a multipath scenario, and bios will still be
re-routed to other paths. Or queued if all paths are out of tags.
Hence the OS would not see any difference in behaviour.
But in the end, we abandoned this attempt, as the crash we've been
seeing was in bio_endio (due to bi_bdev still pointing to the removed
path device):
[ 6552.155251] bio_endio+0x74/0x120
[ 6552.155260] nvme_ns_head_submit_bio+0x36f/0x3e0 [nvme_core]
[ 6552.155271] submit_bio_noacct+0x175/0x490
[ 6552.155284] ? nvme_requeue_work+0x5a/0x70 [nvme_core]
[ 6552.155290] nvme_requeue_work+0x5a/0x70 [nvme_core]
[ 6552.155296] process_one_work+0x1f4/0x3e0
[ 6552.155299] worker_thread+0x2d/0x3e0
[ 6552.155302] ? process_one_work+0x3e0/0x3e0
[ 6552.155305] kthread+0x10d/0x130
[ 6552.155307] ? kthread_park+0xa0/0xa0
[ 6552.155311] ret_from_fork+0x35/0x40
So we're not blocked on blk_queue_enter(), and it's a crash, not a
deadlock. Blocking on blk_queue_enter() certainly plays a part here,
but is seems not to be the full picture.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer