On Thu, Apr 20, 2017 at 02:16:04PM -0600, Jens Axboe wrote: > On 04/20/2017 02:07 PM, Omar Sandoval wrote: > > On Fri, Apr 07, 2017 at 06:24:03AM -0600, sbates@xxxxxxxxxxxx wrote: > >> From: Stephen Bates <sbates@xxxxxxxxxxxx> > >> > >> Rather than bucketing IO statisics based on direction only we also > >> bucket based on the IO size. This leads to improved polling > >> performance. Update the bucket callback function and use it in the > >> polling latency estimation. > >> > >> Signed-off-by: Stephen Bates <sbates@xxxxxxxxxxxx> > > > > Hey, Stephen, just taking a look at this now. Comments below. > > > >> --- > >> block/blk-mq.c | 45 +++++++++++++++++++++++++++++++++++---------- > >> 1 file changed, 35 insertions(+), 10 deletions(-) > >> > >> diff --git a/block/blk-mq.c b/block/blk-mq.c > >> index 061fc2c..5fd376b 100644 > >> --- a/block/blk-mq.c > >> +++ b/block/blk-mq.c > >> @@ -42,6 +42,25 @@ static LIST_HEAD(all_q_list); > >> static void blk_mq_poll_stats_start(struct request_queue *q); > >> static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb); > >> > >> +/* Must be consisitent with function below */ > >> +#define BLK_MQ_POLL_STATS_BKTS 16 > >> +static int blk_mq_poll_stats_bkt(const struct request *rq) > >> +{ > >> + int ddir, bytes, bucket; > >> + > >> + ddir = blk_stat_rq_ddir(rq); > > > > No need to call the wrapper function here, we can use rq_data_dir() > > directly. > > > >> + bytes = blk_rq_bytes(rq); > >> + > >> + bucket = ddir + 2*(ilog2(bytes) - 9); > >> + > >> + if (bucket < 0) > >> + return -1; > >> + else if (bucket >= BLK_MQ_POLL_STATS_BKTS) > >> + return ddir + BLK_MQ_POLL_STATS_BKTS - 2; > >> + > >> + return bucket; > >> +} > > > > Nitpicking here, but defining things in terms of the number of size > > buckets seems more natural to me. How about something like this > > (untested)? Note that this obviates the need for patch 1. > > > > #define BLK_MQ_POLL_STATS_SIZE_BKTS 8 > > static unsigned int blk_mq_poll_stats_bkt(const struct request *rq) > > { > > unsigned int size_bucket; > > > > size_bucket = clamp(ilog2(blk_rq_bytes(rq)) - 9, 0, > > BLK_MQ_POLL_STATS_SIZE_BKTS - 1); > > return 2 * size_bucket + rq_data_dir(rq); > > } > > As I wrote in an earlier reply, it would be a lot cleaner to just have > the buckets be: > > buckets[2][BUCKETS_PER_RW]; > > and not have to do weird math based on both size and data direction. > Just have it return the bucket index based on size, and have the caller > do: > > bucket[rq_data_dir(rq)][bucket_index]; This removes a lot of the flexibility of the interface. Kyber, for one, has this stats callback: static unsigned int rq_sched_domain(const struct request *rq) { unsigned int op = rq->cmd_flags; if ((op & REQ_OP_MASK) == REQ_OP_READ) return KYBER_READ; else if ((op & REQ_OP_MASK) == REQ_OP_WRITE && op_is_sync(op)) return KYBER_SYNC_WRITE; else return KYBER_OTHER; } The buckets aren't subdivisions of read vs. write. We could shoehorn it in your way if we really wanted to, but that's pointless.