On Mon, Oct 03, 2022 at 12:43:44PM +0300, Sagi Grimberg wrote: > Our mpath stack device is just a shim that selects a bottom namespace > and submits the bio to it without any fancy splitting. This also means > that we don't clone the bio or have any context to the bio beyond > submission. However it really sucks that we don't see the mpath device > io stats. > > Given that the mpath device can't do that without adding some context > to it, we let the bottom device do it on its behalf (somewhat similar > to the approach taken in nvme_trace_bio_complete). > > When the IO starts, we account the request for multipath IO stats using > REQ_NVME_MPATH_IO_STATS nvme_request flag to avoid queue io stats disable > in the middle of the request. An unfortunate side effect is that a successful error failover will get accounted for twice in the mpath device, but cloning to create a separate context just to track iostats for that unusual condition is much worse. Reviewed-by: Keith Busch <kbusch@xxxxxxxxxx> > +void nvme_mpath_start_request(struct request *rq) > +{ > + struct nvme_ns *ns = rq->q->queuedata; > + struct gendisk *disk = ns->head->disk; > + > + if (!blk_queue_io_stat(disk->queue) || blk_rq_is_passthrough(rq)) > + return; > + > + nvme_req(rq)->flags |= NVME_MPATH_IO_STATS; > + nvme_req(rq)->start_time = bdev_start_io_acct(disk->part0, > + blk_rq_bytes(rq) >> SECTOR_SHIFT, > + req_op(rq), jiffies); > +} > +void nvme_mpath_end_request(struct request *rq) > +{ > + struct nvme_ns *ns = rq->q->queuedata; > + > + if (!(nvme_req(rq)->flags & NVME_MPATH_IO_STATS)) > + return; > + bdev_end_io_acct(ns->head->disk->part0, req_op(rq), > + nvme_req(rq)->start_time); > +} I think these also can be static inline.