On 2022/06/02 12:35, lijiang wrote: > On Wed, Jun 1, 2022 at 4:51 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@xxxxxxx <mailto:k-hagio-ab@xxxxxxx>> wrote: > > Hi Lianbo, > > Thank you for the update, much better structure :) > > Let's go into detail... > > > Thank you for helping the review, it seems that we are getting closer to this goal. yes, thanks to your hard work :) > > On 2022/05/30 18:15, Lianbo Jiang wrote: > > Currently, crash doesn't support to display disk I/O statistics > > for blk-mq devices. For more details, please refer to the following > > commit: <98b417fc6346> ("Handle blk_mq_ctx member changes for kernels > > 5.16-rc1 and later"). > > > > Lets parse the bitmap in blk-mq layer to achieve it. > > > > Signed-off-by: Lianbo Jiang <lijiang@xxxxxxxxxx <mailto:lijiang@xxxxxxxxxx>> > > --- > > defs.h | 11 +++ > > dev.c | 261 ++++++++++++++++++++++++++++++++++++++++++++++-------- > > symbols.c | 22 +++++ > > 3 files changed, 255 insertions(+), 39 deletions(-) > > > > diff --git a/defs.h b/defs.h > > index c8444b4e54eb..2681586a33dc 100644 > > --- a/defs.h > > +++ b/defs.h > > @@ -2170,6 +2170,16 @@ struct offset_table { /* stash of commonly-used offsets */ > > long sbq_wait_state_wait; > > long sbitmap_alloc_hint; > > long sbitmap_round_robin; > > + long request_cmd_flags; > > + long request_q; > > + long request_state; > > + long request_queue_queue_hw_ctx; > > + long request_queue_nr_hw_queues; > > + long blk_mq_hw_ctx_tags; > > + long blk_mq_tags_bitmap_tags; > > + long blk_mq_tags_breserved_tags; > > + long blk_mq_tags_nr_reserved_tags; > > + long blk_mq_tags_rqs; > > }; > > > > struct size_table { /* stash of commonly-used sizes */ > > @@ -2339,6 +2349,7 @@ struct size_table { /* stash of commonly-used sizes */ > > long sbitmap; > > long sbitmap_queue; > > long sbq_wait_state; > > + long blk_mq_tags; > > }; > > > > struct array_table { > > diff --git a/dev.c b/dev.c > > index a493e51ac95c..c0240f7b889b 100644 > > --- a/dev.c > > +++ b/dev.c > > @@ -4238,19 +4238,193 @@ get_one_mctx_diskio(unsigned long mctx, struct diskio *io) > > io->write = (dispatch[1] - comp[1]); > > } > > > > +typedef bool (busy_tag_iter_fn)(ulong rq, void *data); > > + > > +struct bt_iter_data { > > I think that some names are confusing and not synced with the kernel. > How about renaming to "mq_inflight"? > > > + ulong q; > > + struct diskio *dio; > > +}; > > + > > +struct bt_tags_iter_data { > > And please rename this to "bt_iter_data". > > > + ulong tags; > > + uint flags; > > rename to "reserved". > > > OK, I will rename the above three variables. > > > + uint nr_reserved_tags; > > + busy_tag_iter_fn *fn; > > + void *data; > > +}; > > + > > +/* > > + * See the include/linux/blk_types.h and include/linux/blk-mq.h > > + */ > > +#define MQ_RQ_IN_FLIGHT 1 > > +#define REQ_OP_BITS 8 > > +#define REQ_OP_MASK ((1 << REQ_OP_BITS) - 1) > > + > > +static uint op_is_write(uint op) > > +{ > > + return (op & REQ_OP_MASK) & 1; > > +} > > + > > +static bool mq_check_inflight(ulong rq, void *data) > > +{ > > + uint cmd_flags = 0, state = 0; > > + ulong addr = 0, queue = 0; > > + struct bt_iter_data *iter = data; > > + > > + if (!IS_KVADDR(rq)) > > + return TRUE; > > Why return TRUE? > Does this mean that rq can be invalid with bit on? > > The intent is to filter out the zero address of rq, and crash cannot completely believe that the address > of rq is always valid. To be on the safe side, it should be good to do this check. > In addition, the kernel has the similar logic of code, for example: > static bool bt_tags_iter(...) > { > ... > if (!rq) > return true; > ^^^^ > ... > } Got it, thanks. I missed this comment: /* * We can hit rq == NULL here, because the tagging functions * test and set the bit before assigning ->rqs[]. */ > > > + > > + addr = rq + OFFSET(request_q); > > + if (!readmem(addr, KVADDR, &queue, sizeof(ulong), "request.q", RETURN_ON_ERROR)) > > + return FALSE; > > + > > + addr = rq + OFFSET(request_cmd_flags); > > + if (!readmem(addr, KVADDR, &cmd_flags, sizeof(uint), "request.cmd_flags", RETURN_ON_ERROR)) > > + return FALSE; > > + > > + addr = rq + OFFSET(request_state); > > + if (!readmem(addr, KVADDR, &state, sizeof(uint), "request.state", RETURN_ON_ERROR)) > > + return FALSE; > > Rethinking these, RETURN_ON_ERROR leads to stop counting (i.e. wrong results) > with error messages. I think it would be better to use FAULT_ON_ERROR unless > we can ignore or handle an error correctly. What do you think? > > (And for the RETURN_ON_ERRORs below, same as above.) > > At first, I had the same opinion as yours. But eventually, called the readmem() with the "RETURN_ON_ERROR", > the reason is that crash can help to output the statistics as much as possible, which is helpful to user, and once the > readmem() failed, it will print an appropriate error, and it can explicitly tell users that the statistics may be unreliable. Agree. I tested and got this: crash> dev -d MAJOR GENDISK NAME REQUEST_QUEUE TOTAL ASYNC SYNC dev: invalid kernel virtual address: 0 type: "request.q" 252 ffff88ee0188c800 vda ffff88ee11588ff0 0 0 0 253 ffff88ee109fa200 dm-0 ffff88ee01a7aa80 0 0 0 253 ffff88ee109fb000 dm-1 ffff88ee01a7da50 12 12 0 It would be better to print the remains as much as possible, let's go with RETURN_ON_ERROR. > > If this is really confusing to the user, let's replace it with the "FAULT_ON_ERROR". > > > + > > + if (queue == iter->q && state == MQ_RQ_IN_FLIGHT) { > > + if (op_is_write(cmd_flags)) > > + iter->dio->write++; > > + else > > + iter->dio->read++; > > + } > > + > > + return TRUE; > > +} > > + > > +static bool bt_tags_iter(uint bitnr, void *data) I missed this, please rename also this to bt_iter, according to the kernel path that we follow. Thanks, Kazu > > +{ > > + ulong addr = 0, rqs_addr = 0, rq = 0; > > + struct bt_tags_iter_data *tag_iter = data; > > + ulong tag = tag_iter->tags; > > + > > + if (!tag_iter->flags) > > + bitnr += tag_iter->nr_reserved_tags; > > + > > + /* rqs */ > > + addr = tag + OFFSET(blk_mq_tags_rqs); > > + if (!readmem(addr, KVADDR, &rqs_addr, sizeof(void *), "blk_mq_tags.rqs", RETURN_ON_ERROR)) > > + return FALSE; > > + > > + addr = rqs_addr + bitnr * sizeof(ulong); /* rqs[bitnr] */ > > + if (!readmem(addr, KVADDR, &rq, sizeof(ulong), "blk_mq_tags.rqs[]", RETURN_ON_ERROR)) > > + return FALSE; > > + > > + return tag_iter->fn(rq, tag_iter->data); > > +} > > + > > +static void bt_tags_for_each(ulong q, ulong tags, ulong sbq, uint flags, uint nr_resvd_tags, struct diskio *dio) > > Please rename this to bt_for_each. > > No problem. > > > +{ > > + struct sbitmap_context sc = {0}; > > > + struct diskio tmp = {0}; > > This tmp can be removed? > > (The dio->{read,write} are incremented in mq_check_inflight(), so > we don't need the local tmps for each function, I think) > > Let me double check. > > > + struct bt_iter_data iter_data = { > > + .q = q, > > + .dio = &tmp, > > + }; > > + struct bt_tags_iter_data tags_iter_data = { > > + .tags = tags, > > + .flags = flags, > > + .nr_reserved_tags = nr_resvd_tags, > > + .fn = mq_check_inflight, > > + .data = &iter_data, > > + }; > > + > > + sbitmap_context_load(sbq + OFFSET(sbitmap_queue_sb), &sc); > > + sbitmap_for_each_set(&sc, bt_tags_iter, &tags_iter_data); > > + > > + dio->read = tmp.read; > > + dio->write = tmp.write; > > +} > > + > > +static void queue_for_each_hw_ctx(ulong q, ulong *hctx, uint cnt, struct diskio *dio) > > +{ > > + uint i; > > > + struct diskio tmp = {0}; > > This also can be removed. > > > + > > + if (!hctx || !dio) > > + error(FATAL, "%s: Invalid parameter.\n", __func__); > > This can be used? > > Will remove it. > > > + > > + for (i = 0; i < cnt; i++) { > > + ulong addr = 0, tags = 0; > > + uint nr_reserved_tags = 0; > > + > > > + if (!IS_KVADDR(hctx[i])) > > + continue; > > Is this the remain of the v1 patch? > > > Yes, this can be removed from the 'for loop'. > > > + > > + /* Tags owned by the block driver */ > > + addr = hctx[i] + OFFSET(blk_mq_hw_ctx_tags); > > + if (!readmem(addr, KVADDR, &tags, sizeof(ulong), > > + "blk_mq_hw_ctx.tags", RETURN_ON_ERROR)) > > + break; > > + > > + addr = tags + OFFSET(blk_mq_tags_nr_reserved_tags); > > + if (!readmem(addr, KVADDR, &nr_reserved_tags, sizeof(uint), > > + "blk_mq_tags_nr_reserved_tags", RETURN_ON_ERROR)) > > + break; > > + > > + if (nr_reserved_tags) { > > + addr = tags + OFFSET(blk_mq_tags_breserved_tags); > > + bt_tags_for_each(q, tags, addr, 1, nr_reserved_tags, &tmp); > > + } > > + addr = tags + OFFSET(blk_mq_tags_bitmap_tags); > > + bt_tags_for_each(q, tags, addr, 0, nr_reserved_tags, &tmp); > > (the local tmp set by the nr_reserved_tags case is overwritten here.) > > Thanks for pointing out this issue. It should be accumulated in the bt_tags_for_each() as below: > + dio->read += tmp.read; > + dio->write += tmp.write; > > But anyway, I will rethink the variable 'tmp' and see what is the good way. > > Thanks. > Lianbo > > > + } > > + > > + dio->read = tmp.read; > > + dio->write = tmp.write; > > +} > > + > > +static void get_mq_diskio_from_hw_queues(ulong q, struct diskio *dio) > > +{ > > + uint cnt = 0; > > + ulong addr = 0, hctx_addr = 0; > > + ulong *hctx_array = NULL; > > > + struct diskio tmp = {0}; > > This also can be removed. > > Ditto. > > > + > > + addr = q + OFFSET(request_queue_nr_hw_queues); > > + readmem(addr, KVADDR, &cnt, sizeof(uint), > > + "request_queue.nr_hw_queues", FAULT_ON_ERROR); > > + > > + addr = q + OFFSET(request_queue_queue_hw_ctx); > > + readmem(addr, KVADDR, &hctx_addr, sizeof(void *), > > + "request_queue.queue_hw_ctx", FAULT_ON_ERROR); > > + > > + hctx_array = (ulong *)GETBUF(sizeof(void *) * cnt); > > + if (!hctx_array) > > + error(FATAL, "fail to get memory for the hctx_array\n"); > > + > > + if (!readmem(hctx_addr, KVADDR, hctx_array, sizeof(void *) * cnt, > > + "request_queue.queue_hw_ctx[]", RETURN_ON_ERROR)) { > > + FREEBUF(hctx_array); > > + return; > > + } > > + > > + queue_for_each_hw_ctx(q, hctx_array, cnt, &tmp); > > + dio->read = tmp.read; > > + dio->write = tmp.write; > > + > > + FREEBUF(hctx_array); > > +} > > + > > static void > > get_mq_diskio(unsigned long q, unsigned long *mq_count) > > { > > int cpu; > > unsigned long queue_ctx; > > unsigned long mctx_addr; > > - struct diskio tmp; > > + struct diskio tmp = {0}; > > > > if (INVALID_MEMBER(blk_mq_ctx_rq_dispatched) || > > - INVALID_MEMBER(blk_mq_ctx_rq_completed)) > > + INVALID_MEMBER(blk_mq_ctx_rq_completed)) { > > + get_mq_diskio_from_hw_queues(q, &tmp); > > + mq_count[0] = tmp.read; > > + mq_count[1] = tmp.write; > > return; > > - > > - memset(&tmp, 0x00, sizeof(struct diskio)); > > + } > > > > readmem(q + OFFSET(request_queue_queue_ctx), KVADDR, &queue_ctx, > > sizeof(ulong), "request_queue.queue_ctx", > > @@ -4479,41 +4653,24 @@ display_one_diskio(struct iter *i, unsigned long gendisk, ulong flags) > > && (io.read + io.write == 0)) > > return; > > > > - if (use_mq_interface(queue_addr) && > > - (INVALID_MEMBER(blk_mq_ctx_rq_dispatched) || > > - INVALID_MEMBER(blk_mq_ctx_rq_completed))) > > - fprintf(fp, "%s%s%s %s%s%s%s %s%s%s", > > - mkstring(buf0, 5, RJUST|INT_DEC, (char *)(unsigned long)major), > > - space(MINSPACE), > > - mkstring(buf1, VADDR_PRLEN, LJUST|LONG_HEX, (char *)gendisk), > > - space(MINSPACE), > > - mkstring(buf2, 10, LJUST, disk_name), > > - space(MINSPACE), > > - mkstring(buf3, VADDR_PRLEN <= 11 ? 11 : VADDR_PRLEN, > > - LJUST|LONG_HEX, (char *)queue_addr), > > - space(MINSPACE), > > - mkstring(buf4, 17, RJUST, "(not supported)"), > > - space(MINSPACE)); > > - > > - else > > - fprintf(fp, "%s%s%s %s%s%s%s %s%5d%s%s%s%s%s", > > - mkstring(buf0, 5, RJUST|INT_DEC, (char *)(unsigned long)major), > > - space(MINSPACE), > > - mkstring(buf1, VADDR_PRLEN, LJUST|LONG_HEX, (char *)gendisk), > > - space(MINSPACE), > > - mkstring(buf2, 10, LJUST, disk_name), > > - space(MINSPACE), > > - mkstring(buf3, VADDR_PRLEN <= 11 ? 11 : VADDR_PRLEN, > > - LJUST|LONG_HEX, (char *)queue_addr), > > - space(MINSPACE), > > - io.read + io.write, > > - space(MINSPACE), > > - mkstring(buf4, 5, RJUST|INT_DEC, > > - (char *)(unsigned long)io.read), > > - space(MINSPACE), > > - mkstring(buf5, 5, RJUST|INT_DEC, > > - (char *)(unsigned long)io.write), > > - space(MINSPACE)); > > + fprintf(fp, "%s%s%s %s%s%s%s %s%5d%s%s%s%s%s", > > + mkstring(buf0, 5, RJUST|INT_DEC, (char *)(unsigned long)major), > > + space(MINSPACE), > > + mkstring(buf1, VADDR_PRLEN, LJUST|LONG_HEX, (char *)gendisk), > > + space(MINSPACE), > > + mkstring(buf2, 10, LJUST, disk_name), > > + space(MINSPACE), > > + mkstring(buf3, VADDR_PRLEN <= 11 ? 11 : VADDR_PRLEN, > > + LJUST|LONG_HEX, (char *)queue_addr), > > + space(MINSPACE), > > + io.read + io.write, > > + space(MINSPACE), > > + mkstring(buf4, 5, RJUST|INT_DEC, > > + (char *)(unsigned long)io.read), > > + space(MINSPACE), > > + mkstring(buf5, 5, RJUST|INT_DEC, > > + (char *)(unsigned long)io.write), > > + space(MINSPACE)); > > > > if (VALID_MEMBER(request_queue_in_flight)) { > > if (!use_mq_interface(queue_addr)) { > > @@ -4597,6 +4754,9 @@ void diskio_init(void) > > MEMBER_OFFSET_INIT(kobject_entry, "kobject", "entry"); > > MEMBER_OFFSET_INIT(kset_list, "kset", "list"); > > MEMBER_OFFSET_INIT(request_list_count, "request_list", "count"); > > + MEMBER_OFFSET_INIT(request_cmd_flags, "request", "cmd_flags"); > > + MEMBER_OFFSET_INIT(request_q, "request", "q"); > > + MEMBER_OFFSET_INIT(request_state, "request", "state"); > > MEMBER_OFFSET_INIT(request_queue_in_flight, "request_queue", > > "in_flight"); > > if (MEMBER_EXISTS("request_queue", "rq")) > > @@ -4608,10 +4768,33 @@ void diskio_init(void) > > "mq_ops"); > > ANON_MEMBER_OFFSET_INIT(request_queue_queue_ctx, > > "request_queue", "queue_ctx"); > > + MEMBER_OFFSET_INIT(request_queue_queue_hw_ctx, > > + "request_queue", "queue_hw_ctx"); > > + MEMBER_OFFSET_INIT(request_queue_nr_hw_queues, > > + "request_queue", "nr_hw_queues"); > > MEMBER_OFFSET_INIT(blk_mq_ctx_rq_dispatched, "blk_mq_ctx", > > "rq_dispatched"); > > MEMBER_OFFSET_INIT(blk_mq_ctx_rq_completed, "blk_mq_ctx", > > "rq_completed"); > > + MEMBER_OFFSET_INIT(blk_mq_hw_ctx_tags, "blk_mq_hw_ctx", "tags"); > > + MEMBER_OFFSET_INIT(blk_mq_tags_bitmap_tags, "blk_mq_tags", > > + "bitmap_tags"); > > + MEMBER_OFFSET_INIT(blk_mq_tags_breserved_tags, "blk_mq_tags", > > + "breserved_tags"); > > + MEMBER_OFFSET_INIT(blk_mq_tags_nr_reserved_tags, "blk_mq_tags", > > + "nr_reserved_tags"); > > + MEMBER_OFFSET_INIT(blk_mq_tags_rqs, "blk_mq_tags", "rqs"); > > + STRUCT_SIZE_INIT(blk_mq_tags, "blk_mq_tags"); > > + STRUCT_SIZE_INIT(sbitmap, "sbitmap"); > > + STRUCT_SIZE_INIT(sbitmap_word, "sbitmap_word"); > > + MEMBER_OFFSET_INIT(sbitmap_word_word, "sbitmap_word", "word"); > > + MEMBER_OFFSET_INIT(sbitmap_word_cleared, "sbitmap_word", "cleared"); > > + MEMBER_OFFSET_INIT(sbitmap_depth, "sbitmap", "depth"); > > + MEMBER_OFFSET_INIT(sbitmap_shift, "sbitmap", "shift"); > > + MEMBER_OFFSET_INIT(sbitmap_map_nr, "sbitmap", "map_nr"); > > + MEMBER_OFFSET_INIT(sbitmap_map, "sbitmap", "map"); > > + MEMBER_OFFSET_INIT(sbitmap_queue_sb, "sbitmap_queue", "sb"); > > + > > } > > MEMBER_OFFSET_INIT(subsys_private_klist_devices, "subsys_private", > > "klist_devices"); > > diff --git a/symbols.c b/symbols.c > > index 5d12a021c769..c1f09556d710 100644 > > --- a/symbols.c > > +++ b/symbols.c > > @@ -10385,6 +10385,12 @@ dump_offset_table(char *spec, ulong makestruct) > > OFFSET(kset_list)); > > fprintf(fp, " request_list_count: %ld\n", > > OFFSET(request_list_count)); > > + fprintf(fp, " request_cmd_flags: %ld\n", > > + OFFSET(request_cmd_flags)); > > + fprintf(fp, " request_q: %ld\n", > > + OFFSET(request_q)); > > + fprintf(fp, " request_state: %ld\n", > > + OFFSET(request_state)); > > fprintf(fp, " request_queue_in_flight: %ld\n", > > OFFSET(request_queue_in_flight)); > > fprintf(fp, " request_queue_rq: %ld\n", > > @@ -10393,10 +10399,25 @@ dump_offset_table(char *spec, ulong makestruct) > > OFFSET(request_queue_mq_ops)); > > fprintf(fp, " request_queue_queue_ctx: %ld\n", > > OFFSET(request_queue_queue_ctx)); > > + fprintf(fp, " request_queue_queue_hw_ctx: %ld\n", > > + OFFSET(request_queue_queue_hw_ctx)); > > + fprintf(fp, " request_queue_nr_hw_queues: %ld\n", > > + OFFSET(request_queue_nr_hw_queues)); > > fprintf(fp, " blk_mq_ctx_rq_dispatched: %ld\n", > > OFFSET(blk_mq_ctx_rq_dispatched)); > > fprintf(fp, " blk_mq_ctx_rq_completed: %ld\n", > > OFFSET(blk_mq_ctx_rq_completed)); > > + fprintf(fp, " blk_mq_hw_ctx_tags: %ld\n", > > + OFFSET(blk_mq_hw_ctx_tags)); > > + fprintf(fp, " blk_mq_tags_bitmap_tags: %ld\n", > > + OFFSET(blk_mq_tags_bitmap_tags)); > > + fprintf(fp, " blk_mq_tags_breserved_tags: %ld\n", > > + OFFSET(blk_mq_tags_breserved_tags)); > > + fprintf(fp, " blk_mq_tags_nr_reserved_tags: %ld\n", > > + OFFSET(blk_mq_tags_nr_reserved_tags)); > > + fprintf(fp, " blk_mq_tags_rqs: %ld\n", > > + OFFSET(blk_mq_tags_rqs)); > > Good, thanks for adjusting these indents. > > Thanks, > Kazu > > > + > > fprintf(fp, " subsys_private_klist_devices: %ld\n", > > OFFSET(subsys_private_klist_devices)); > > fprintf(fp, " subsystem_kset: %ld\n", > > @@ -11003,6 +11024,7 @@ dump_offset_table(char *spec, ulong makestruct) > > fprintf(fp, " sbitmap: %ld\n", SIZE(sbitmap)); > > fprintf(fp, " sbitmap_queue: %ld\n", SIZE(sbitmap_queue)); > > fprintf(fp, " sbq_wait_state: %ld\n", SIZE(sbq_wait_state)); > > + fprintf(fp, " blk_mq_tags: %ld\n", SIZE(blk_mq_tags)); > > > > fprintf(fp, "\n array_table:\n"); > > /* > -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/crash-utility Contribution Guidelines: https://github.com/crash-utility/crash/wiki