Re: [PATCH RFC 4/5] block: add a statistic table for io latency

Ming Lei <ming.lei@xxxxxxxxxx> · Sat, 11 Jul 2020 09:32:12 +0800

On Fri, Jul 10, 2020 at 12:29:28PM +0200, Guoqing Jiang wrote:
> On 7/10/20 12:00 PM, Ming Lei wrote:
> > On Fri, Jul 10, 2020 at 10:55:24AM +0200, Guoqing Jiang wrote:
> > > Hi Ming,
> > > 
> > > On 7/10/20 2:53 AM, Ming Lei wrote:
> > > > Hi Guoqing,
> > > > 
> > > > On Thu, Jul 09, 2020 at 08:48:08PM +0200, Guoqing Jiang wrote:
> > > > > Hi Ming,
> > > > > 
> > > > > On 7/8/20 4:06 PM, Guoqing Jiang wrote:
> > > > > > On 7/8/20 4:02 PM, Guoqing Jiang wrote:
> > > > > > > > Hi Guoqing,
> > > > > > > > 
> > > > > > > > I believe it isn't hard to write a ebpf based script(bcc or
> > > > > > > > bpftrace) to
> > > > > > > > collect this kind of performance data, so looks not necessary to do it
> > > > > > > > in kernel.
> > > > > > > Hi Ming,
> > > > > > > 
> > > > > > > Sorry, I don't know well about bcc or bpftrace, but I assume they
> > > > > > > need to
> > > > > > > read the latency value from somewhere inside kernel. Could you point
> > > > > > > how can I get the latency value? Thanks in advance!
> > > > > > Hmm, I suppose biolatency is suitable for track latency, will look into
> > > > > > it.
> > > > > I think biolatency can't trace data if it is not running,
> > > > Yeah, the ebpf prog is only injected when the trace is started.
> > > > 
> > > > > also seems no
> > > > > place
> > > > > inside kernel have recorded such information for ebpf to read, correct me
> > > > > if my understanding is wrong.
> > > > Just record the info by starting the bcc script in case you need that, is there
> > > > anything wrong with this usage? Always doing such stuff in kernel isn't fair for
> > > > users which don't care or need this info.
> > > That is why we add a Kconfig option and set it to N by default. And I
> > > suppose
> > > with modern cpu, the cost with several more instructions would not be that
> > > expensive even the option is enabled, just my $0.02.
> > > 
> > > > > And as cloud provider,we would like to know data when necessary instead
> > > > > of collect data by keep script running because it is expensive than just
> > > > > read
> > > > > node IMHO.
> > > > It shouldn't be expensive. It might be a bit slow to inject the ebpf prog because
> > > > the code has to be verified, however once it is put inside kernel, it should have
> > > > been efficient enough. The kernel side prog only updates & stores the latency
> > > > summery data into bpf map, and the stored summery data can be read out anytime
> > > > by userspace.
> > > > 
> > > > Could you explain a bit why it is expensive? such as biolatency
> > > I thought I am compare read a sys node + extra instructions in kernel with
> > > launch a specific process for monitoring which need to occupy more
> > > resources (memory) and context switch. And for biolatency, it calls the
> > > bpf_ktime_get_ns to calculate latency for each IO which I assume the
> > > ktime_get_ns will be triggered finally, and it is not cheap as you said.
> > You can replace one read of timestamp with rq->start_time_ns too, just
> > like what this patch does. You can write your bcc/bfptrace script,
> > which is quite easy to start. Once you learn its power, maybe you will love
> > it.
> 
> Yes, I definitely need to learn more about it :-). But even with the change,
> I still believe read a node is cheaper than a script.
> 
> And seems biolatency can't trace bio based driver per below, and with
> collect data in tree we can trace all block drivers.
> 
> # load BPF program
> b = BPF(text=bpf_text)
> if args.queued:
>     b.attach_kprobe(event="blk_account_io_start", fn_name="trace_req_start")
> else:
>     b.attach_kprobe(event="blk_start_request", fn_name="trace_req_start")
>     b.attach_kprobe(event="blk_mq_start_request", fn_name="trace_req_start")
> b.attach_kprobe(event="blk_account_io_completion",
>     fn_name="trace_req_completion")
> 
> Could it possible to extend it support trace both request and bio? Otherwise
> we have to run another script to trace md raid.

It is pretty easy to extend support bio, just add kprobe on submit_bio
and bio_endio().

thanks,
Ming