Hi Li, From: Li Zefan <lizf@xxxxxxxxxxxxxx> Subject: Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband Date: Tue, 12 May 2009 16:10:15 +0800 > CC more people on tracing issue > > Ryo Tsuruta wrote: > > Hi Li, > > > > From: Li Zefan <lizf@xxxxxxxxxxxxxx> > > Subject: Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband > > Date: Tue, 12 May 2009 11:49:07 +0800 > > > >> Ryo Tsuruta wrote: > >>> Hi Li, > >>> > >>> From: Ryo Tsuruta <ryov@xxxxxxxxxxxxx> > >>> Subject: Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband > >>> Date: Thu, 07 May 2009 09:23:22 +0900 (JST) > >>> > >>>> Hi Li, > >>>> > >>>> From: Li Zefan <lizf@xxxxxxxxxxxxxx> > >>>> Subject: Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband > >>>> Date: Mon, 04 May 2009 11:24:27 +0800 > >>>> > >>>>> Ryo Tsuruta wrote: > >>>>>> Hi Alan, > >>>>>> > >>>>>>> Hi Ryo - > >>>>>>> > >>>>>>> I don't know if you are taking in patches, but whilst trying to uncover > >>>>>>> some odd behavior I added some blktrace messages to dm-ioband-ctl.c. If > >>>>>>> you're keeping one code base for old stuff (2.6.18-ish RHEL stuff) and > >>>>>>> upstream you'll have to #if around these (the blktrace message stuff > >>>>>>> came in around 2.6.26 or 27 I think). > >>>>>>> > >>>>>>> My test case was to take a single 400GB storage device, put two 200GB > >>>>>>> partitions on it and then see what the "penalty" or overhead for adding > >>>>>>> dm-ioband on top. To do this I simply created an ext2 FS on each > >>>>>>> partition in parallel (two processes each doing a mkfs to one of the > >>>>>>> partitions). Then I put two dm-ioband devices on top of the two > >>>>>>> partitions (setting the weight to 100 in both cases - thus they should > >>>>>>> have equal access). > >>>>>>> > >>>>>>> Using default values I was seeing /very/ large differences - on the > >>>>>>> order of 3X. When I bumped the number of tokens to a large number > >>>>>>> (10,240) the timings got much closer (<2%). I have found that using > >>>>>>> weight-iosize performs worse than weight (closer to 5% penalty). > >>>>>> I could reproduce similar results. One dm-ioband device seems to stop > >>>>>> issuing I/Os for a few seconds at times. I'll investigate more on that. > >>>>>> > >>>>>>> I'll try to formalize these results as I go forward and report out on > >>>>>>> them. In any event, I thought I'd share this patch with you if you are > >>>>>>> interested... > >>>>>> Thanks. I'll include your patche into the next release. > >>>>>> > >>>>> IMO we should use TRACE_EVENT instead of adding new blk_add_trace_msg(). > >>>> Thanks for your suggestion. I'll use TRACE_EVENT instead. > >>> blk_add_trace_msg() supports both blktrace and tracepoints. I can > >>> get messages from dm-ioband through debugfs. Could you expain why > >>> should we use TRACE_EVENT instead? > >>> > >> Actually blk_add_trace_msg() has nothing to do with tracepoints.. > >> > >> If we use blk_add_trace_msg() is dm, we can use it in md, various block > >> drivers and even ext4. So the right thing is, if a subsystem wants to add > >> trace facility, it should use tracepoints/TRACE_EVENT. > >> > >> With TRACE_EVENT, you can get output through debugfs too, and it can be used > >> together with blktrace: > >> > >> # echo 1 > /sys/block/dm/trace/enable > >> # echo blk > /debugfs/tracing/current_tracer > >> # echo dm-ioband-foo > /debugfs/tracing/tracing/set_event > >> # cat /deubgfs/tracing/trace_pipe > >> > >> And you can enable dm-ioband-foo while disabling dm-ioband-bar, and you can > >> use filter feature too. > > > > Thanks for explaining. > > The base kernel of current dm tree (2.6.30-rc4) has not supported > > dm-device tracing yet. I'll consider using TRACE_EVENT when the base > > kernel supports dm-device tracing. > > > > I think we don't have any plan on dm-device tracing support, do we? And you > don't need that. > > I downloaded dm-ioband 1.10.4, and applied it to latest tip-tree, and made > a patch to convert those blktrace msgs to trace events. > > # ls /debug/tracing/events/dm-ioband > enable ioband_endio ioband_issue_list ioband_urgent_bio > filter ioband_hold_bio ioband_make_request > # echo 1 /debug/tracing/events/dm-ioband/enable > # echo blk /debug/tracing/current_tracer > ... Is the following line unnecessary in the latest tip-tree? # echo 1 > /sys/block/dm/trace/enable How to enable/disable tracing on each device? O.K. I'll try the latest tip-tree. Thanks for your help! > > Here is a sample output: > > pdflush-237 [001] 1706.377788: ioband_hold_bio: 8,9 ioband 1 hold_nrm 1 > pdflush-237 [001] 1706.377795: 254,0 Q W 188832 + 8 [pdflush] > pdflush-237 [001] 1706.377800: ioband_hold_bio: 8,9 ioband 1 hold_nrm 2 > pdflush-237 [001] 1706.377806: 254,0 Q W 197032 + 8 [pdflush] > pdflush-237 [001] 1706.377835: 254,0 C W 82232 + 8 [0] > kioband/1-2569 [001] 1706.377922: ioband_issue_list: 8,9 ioband 1 add_iss 1 1 > kioband/1-2569 [001] 1706.377923: ioband_issue_list: 8,9 ioband 1 add_iss 0 0 > kioband/1-2569 [001] 1706.377925: ioband_make_request: 8,9 ioband 1 pop_iss > kioband/1-2569 [001] 1706.377940: ioband_make_request: 8,9 ioband 1 pop_iss > konsole-2140 [001] 1706.377969: 254,0 C W 90432 + 8 [0] > konsole-2140 [001] 1706.378087: 254,0 C W 98632 + 8 [0] > konsole-2140 [001] 1706.378232: 254,0 C W 106832 + 8 [0] > konsole-2140 [001] 1706.378350: 254,0 C W 115032 + 8 [0] > konsole-2140 [001] 1706.378466: 254,0 C W 123232 + 8 [0] > konsole-2140 [001] 1706.378581: 254,0 C W 131432 + 8 [0] > > --- > drivers/md/dm-ioband-ctl.c | 19 +++++- > include/trace/events/dm-ioband.h | 134 ++++++++++++++++++++++++++++++++++++++ > 2 files changed, 150 insertions(+), 3 deletions(-) > create mode 100644 include/trace/events/dm-ioband.h > > diff --git a/drivers/md/dm-ioband-ctl.c b/drivers/md/dm-ioband-ctl.c > index 151b178..4033724 100644 > --- a/drivers/md/dm-ioband-ctl.c > +++ b/drivers/md/dm-ioband-ctl.c > @@ -17,6 +17,9 @@ > #include "md.h" > #include "dm-ioband.h" > > +#define CREATE_TRACE_POINTS > +#include <trace/events/dm-ioband.h> > + > #define POLICY_PARAM_START 6 > #define POLICY_PARAM_DELIM "=:," > > @@ -632,9 +635,11 @@ static void hold_bio(struct ioband_group *gp, struct bio *bio) > */ > dp->g_prepare_bio(gp, bio, IOBAND_URGENT); > bio_list_add(&dp->g_urgent_bios, bio); > + trace_ioband_hold_bio(bio, dp, true); > } else { > gp->c_blocked++; > dp->g_hold_bio(gp, bio); > + trace_ioband_hold_bio(bio, dp, false); > } > } > > @@ -675,14 +680,17 @@ static int make_issue_list(struct ioband_group *gp, struct bio *bio, > clear_group_blocked(gp); > wake_up_all(&gp->c_waitq); > } > - if (should_pushback_bio(gp)) > + if (should_pushback_bio(gp)) { > bio_list_add(pushback_list, bio); > + trace_ioband_issue_list(bio, dp, gp, true); > + } > else { > int rw = bio_data_dir(bio); > > gp->c_stat[rw].deferred++; > gp->c_stat[rw].sectors += bio_sectors(bio); > bio_list_add(issue_list, bio); > + trace_ioband_issue_list(bio, dp, gp, false); > } > return prepare_to_issue(gp, bio); > } > @@ -702,6 +710,7 @@ static void release_urgent_bios(struct ioband_device *dp, > dp->g_blocked--; > dp->g_issued[bio_data_dir(bio)]++; > bio_list_add(issue_list, bio); > + trace_ioband_urgent_bio(bio, dp); > } > } > > @@ -915,10 +924,14 @@ static void ioband_conduct(struct work_struct *work) > > spin_unlock_irqrestore(&dp->g_lock, flags); > > - while ((bio = bio_list_pop(&issue_list))) > + while ((bio = bio_list_pop(&issue_list))) { > + trace_ioband_make_request(bio, dp); > generic_make_request(bio); > - while ((bio = bio_list_pop(&pushback_list))) > + } > + while ((bio = bio_list_pop(&pushback_list))) { > + trace_ioband_endio(bio, dp); > bio_endio(bio, -EIO); > + } > } > > static int ioband_end_io(struct dm_target *ti, struct bio *bio, > diff --git a/include/trace/events/dm-ioband.h b/include/trace/events/dm-ioband.h > new file mode 100644 > index 0000000..17dd61c > --- /dev/null > +++ b/include/trace/events/dm-ioband.h > @@ -0,0 +1,134 @@ > +#if !defined(_TRACE_DM_IOBAND_H) || defined(TRACE_HEADER_MULTI_READ) > +#define _TRACE_DM_IOBAND_H > + > +#include <linux/tracepoint.h> > +#include <linux/interrupt.h> > + > +#undef TRACE_SYSTEM > +#define TRACE_SYSTEM dm-ioband > + > +TRACE_EVENT(ioband_hold_bio, > + > + TP_PROTO(struct bio *bio, struct ioband_device *dp, bool urg), > + > + TP_ARGS(bio, dp, urg), > + > + TP_STRUCT__entry( > + __field( dev_t, dev ) > + __field( bool, urg ) > + __field( int, g_blocked ) > + __string( g_name, dp->g_name ) > + ), > + > + TP_fast_assign( > + __entry->dev = bio->bi_bdev->bd_dev; > + __entry->urg = urg; > + __entry->g_blocked = dp->g_blocked; > + __assign_str(g_name, dp->g_name); > + ), > + > + TP_printk("%d,%d ioband %s hold_%s %d", > + MAJOR(__entry->dev), MINOR(__entry->dev), > + __get_str(g_name), > + __entry->urg ? "urg" : "nrm", __entry->g_blocked) > +); > + > +TRACE_EVENT(ioband_issue_list, > + > + TP_PROTO(struct bio *bio, struct ioband_device *dp, > + struct ioband_group *gp, bool pushback), > + > + TP_ARGS(bio, dp, gp, pushback), > + > + TP_STRUCT__entry( > + __field( dev_t, dev ) > + __field( bool, pushback ) > + __field( int, g_blocked ) > + __field( int, c_blocked ) > + __string( g_name, dp->g_name ) > + ), > + > + TP_fast_assign( > + __entry->dev = bio->bi_bdev->bd_dev; > + __entry->pushback = pushback; > + __entry->g_blocked = dp->g_blocked; > + __entry->c_blocked = gp->c_blocked; > + __assign_str(g_name, dp->g_name); > + ), > + > + TP_printk("%d,%d ioband %s add_%s %d %d", > + MAJOR(__entry->dev), MINOR(__entry->dev), > + __get_str(g_name), > + __entry->pushback ? "pback" : "iss", > + __entry->g_blocked, __entry->c_blocked) > +); > + > +TRACE_EVENT(ioband_urgent_bio, > + > + TP_PROTO(struct bio *bio, struct ioband_device *dp), > + > + TP_ARGS(bio, dp), > + > + TP_STRUCT__entry( > + __field( dev_t, dev ) > + __field( int, g_blocked ) > + __string( g_name, dp->g_name ) > + ), > + > + TP_fast_assign( > + __entry->dev = bio->bi_bdev->bd_dev; > + __entry->g_blocked = dp->g_blocked; > + __assign_str(g_name, dp->g_name); > + ), > + > + TP_printk("%d,%d ioband %s urg_add_iss %d", > + MAJOR(__entry->dev), MINOR(__entry->dev), > + __get_str(g_name), __entry->g_blocked) > +); > + > +TRACE_EVENT(ioband_make_request, > + > + TP_PROTO(struct bio *bio, struct ioband_device *dp), > + > + TP_ARGS(bio, dp), > + > + TP_STRUCT__entry( > + __field( dev_t, dev ) > + __string( g_name, dp->g_name ) > + ), > + > + TP_fast_assign( > + __entry->dev = bio->bi_bdev->bd_dev; > + __assign_str(g_name, dp->g_name); > + ), > + > + TP_printk("%d,%d ioband %s pop_iss", > + MAJOR(__entry->dev), MINOR(__entry->dev), __get_str(g_name)) > +); > + > +TRACE_EVENT(ioband_endio, > + > + TP_PROTO(struct bio *bio, struct ioband_device *dp), > + > + TP_ARGS(bio, dp), > + > + TP_STRUCT__entry( > + __field( dev_t, dev ) > + __string( g_name, dp->g_name ) > + ), > + > + TP_fast_assign( > + __entry->dev = bio->bi_bdev->bd_dev; > + __assign_str(g_name, dp->g_name); > + ), > + > + TP_printk("%d,%d ioband %s pop_pback", > + MAJOR(__entry->dev), MINOR(__entry->dev), __get_str(g_name)) > +); > + > +#endif /* _TRACE_DM_IOBAND_H */ > + > +/* This part must be outside protection */ > +#include <trace/define_trace.h> > + > + > -- > 1.5.4.rc3 > -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel