2011/10/7 Jeff Moyer <jmoyer@xxxxxxxxxx>: > Christophe Saout <christophe@xxxxxxxx> writes: > >> Hi Jeff, >> >>> Anyway, it would help a great deal if you could retrigger the failure >>> and provide the full failure output. You can get that by issuing the >>> 'dmesg' command and redirecting it to a file. >> >> Oh, sorry, yes, there's a line missing. >> >> Line 323 is this one: BUG_ON(!rq->bio || rq->bio != rq->biotail); > > OK, it turns out my testing was incomplete. I only tested targets that > had a write-through cache, so I didn't hit this problem. It reproduces > pretty easily with just multipath involved (no linear target on top) when > running against the right storage. > > So, here's a patch, but I don't have a full explanation for it just yet. > What I observed was that, on fsync, blkdev_issue_flush was called. > Eventually, the flush request gets cloned, and blk_insert_cloned_request > is called. This cloned request never actually gets issued to the > q->requst_fn (scsi_request_fn in my case). So, it may be that there is > no plug list for this, so the queue isn't goosed? I'll try to come up > with a better explanation, or Tejun may just know off the top of his > head what's going on. blk_insert_flush() just insert request to list and doesn't actually dispatch request to drive, because normally there is other way to run queue later. But blk_insert_cloned_request() actually means dispatch request to drive. If we don't flush queue, the queue will stall. > So, the patch works for me, but is very much just an RFC. > > Cheers, > Jeff > > Signed-off-by: Jeff Moyer <jmoyer@xxxxxxxxxx> > > diff --git a/block/blk-flush.c b/block/blk-flush.c > index 491eb30..7aa4736 100644 > --- a/block/blk-flush.c > +++ b/block/blk-flush.c > @@ -320,7 +320,7 @@ void blk_insert_flush(struct request *rq) > return; > } > > - BUG_ON(!rq->bio || rq->bio != rq->biotail); > + BUG_ON(rq->bio && rq->bio != rq->biotail); > > /* > * If there's data but flush is not necessary, the request can be > @@ -345,6 +345,12 @@ void blk_insert_flush(struct request *rq) > rq->end_io = flush_data_end_io; > > blk_flush_complete_seq(rq, REQ_FSEQ_ACTIONS & ~policy, 0); > + > + /* > + * A cloned empty flush needs a queue kick to make progress. > + */ > + if (!rq->bio) > + blk_run_queue_async(q); > } the rq could be a cloned FUA request, so rq->bio could not be NULL. Better move blk_run_queue_async() to blk_insert_cloned_request(). We need run the queue in flush case anyway. -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel