On Thu, Oct 03, 2024 at 08:30:36AM -0700, Keith Busch wrote: > +What: /sys/block/<disk>/queue/iostats_passthrough > +Date: October 2024 > +Contact: linux-block@xxxxxxxxxxxxxxx > +Description: > + [RW] This file is used to control (on/off) the iostats > + accounting of the disk for passthrough commands. > + > > What: /sys/block/<disk>/queue/logical_block_size > Date: May 2009 > diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c > index 5463697a84428..d9d7fd441297e 100644 > --- a/block/blk-mq-debugfs.c > +++ b/block/blk-mq-debugfs.c > @@ -93,6 +93,7 @@ static const char *const blk_queue_flag_name[] = { > QUEUE_FLAG_NAME(RQ_ALLOC_TIME), > QUEUE_FLAG_NAME(HCTX_ACTIVE), > QUEUE_FLAG_NAME(SQ_SCHED), > + QUEUE_FLAG_NAME(IOSTATS_PASSTHROUGH), > }; > #undef QUEUE_FLAG_NAME > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 8e75e3471ea58..cf309b39bac04 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -993,13 +993,38 @@ static inline void blk_account_io_done(struct request *req, u64 now) > } > } > > +static inline bool blk_rq_passthrough_stats(struct request *req) > +{ > + struct bio *bio = req->bio; > + > + if (!blk_queue_passthrough_stat(req->q)) > + return false; > + > + /* > + * Stats are accumulated in the bdev part, so must have one attached to > + * a bio to do this > + */ > + if (!bio) > + return false; > + if (!bio->bi_bdev) > + return false; Missing. At the end of the sentence. But even then this doesn't explain why not accouting these requests is fine. - requests without a bio are all those that don't transfer data - requests with a bio but not bdev are almost all passthrough requests as far as I can tell, with the only exception of nvme I/O command passthrough. I.e. what we have here is a special casing for nvme I/O commands. Maybe that's fine, but the comments and commit log should leave a clearly visible trace of that and not confuse the hell out of people trying to understand the logic later. > + /* > + * Ensuring the size is aligned to the block size prevents observing an > + * invalid sectors stat. > + */ > + if (blk_rq_bytes(req) & (bdev_logical_block_size(bio->bi_bdev) - 1)) > + return false; Now this probably won't trigger anyway for the usual workload (although it might for odd NVMe command sets like KV and the SLM), but I'd expect the size to be rounded (probably up?) and not entirely dropped. > + ret = queue_var_store(&ios, page, count); > + if (ret < 0) > + return ret; > + > + if (ios) > + blk_queue_flag_set(QUEUE_FLAG_IOSTATS_PASSTHROUGH, > + disk->queue); > + else > + blk_queue_flag_clear(QUEUE_FLAG_IOSTATS_PASSTHROUGH, > + disk->queue); Why is this using queue flags now? This isn't really blk-mq internal, so it should be using queue_limits->flag as pointed out last round.