On Tue, Aug 29, 2017 at 10:54:17AM -0400, Keith Busch wrote: > On Wed, Aug 23, 2017 at 07:58:15PM +0200, Christoph Hellwig wrote: > > + /* Anything else could be a path failure, so should be retried */ > > + spin_lock_irqsave(&ns->head->requeue_lock, flags); > > + blk_steal_bios(&ns->head->requeue_list, req); > > + spin_unlock_irqrestore(&ns->head->requeue_lock, flags); > > + > > + nvme_reset_ctrl(ns->ctrl); > > + kblockd_schedule_work(&ns->head->requeue_work); > > + return true; > > +} > > It appears this isn't going to cause the path selection to failover for > the requeued work. The bio's bi_disk is unchanged from the failed path when the > requeue_work submits the bio again so it will use the same path, right? Oh. This did indeed break with the bi_bdev -> bi_disk refactoring I did just before sending this out. > It also looks like new submissions will get a new path only from the > fact that the original/primary is being reset. The controller reset > itself seems a bit heavy-handed. Can we just set head->current_path to > the next active controller in the list? For ANA we'll have to do that anyway, but if we got a failure that clearly indicates a path failure what benefit is there in not resetting the controller? But yeah, maybe we can just switch the path for non-ANA controllers and wait for timeouts to do their work. > > > +static void nvme_requeue_work(struct work_struct *work) > > +{ > > + struct nvme_ns_head *head = > > + container_of(work, struct nvme_ns_head, requeue_work); > > + struct bio *bio, *next; > > + > > + spin_lock_irq(&head->requeue_lock); > > + next = bio_list_get(&head->requeue_list); > > + spin_unlock_irq(&head->requeue_lock); > > + > > + while ((bio = next) != NULL) { > > + next = bio->bi_next; > > + bio->bi_next = NULL; > > + generic_make_request_fast(bio); > > + } > > +} > > Here, I think we need to reevaluate the path (nvme_find_path) and set > bio->bi_disk accordingly. Yes. Previously this was opencoded and always used head->disk, but I messed it up last minute. In the end it still worked for my cases because the controller would either already be reset or fail all I/O, but this behavior clearly is not intended and suboptimal.