On Fri, Jul 10, 2020 at 12:29:28PM +0200, Guoqing Jiang wrote: > On 7/10/20 12:00 PM, Ming Lei wrote: > > On Fri, Jul 10, 2020 at 10:55:24AM +0200, Guoqing Jiang wrote: > > > Hi Ming, > > > > > > On 7/10/20 2:53 AM, Ming Lei wrote: > > > > Hi Guoqing, > > > > > > > > On Thu, Jul 09, 2020 at 08:48:08PM +0200, Guoqing Jiang wrote: > > > > > Hi Ming, > > > > > > > > > > On 7/8/20 4:06 PM, Guoqing Jiang wrote: > > > > > > On 7/8/20 4:02 PM, Guoqing Jiang wrote: > > > > > > > > Hi Guoqing, > > > > > > > > > > > > > > > > I believe it isn't hard to write a ebpf based script(bcc or > > > > > > > > bpftrace) to > > > > > > > > collect this kind of performance data, so looks not necessary to do it > > > > > > > > in kernel. > > > > > > > Hi Ming, > > > > > > > > > > > > > > Sorry, I don't know well about bcc or bpftrace, but I assume they > > > > > > > need to > > > > > > > read the latency value from somewhere inside kernel. Could you point > > > > > > > how can I get the latency value? Thanks in advance! > > > > > > Hmm, I suppose biolatency is suitable for track latency, will look into > > > > > > it. > > > > > I think biolatency can't trace data if it is not running, > > > > Yeah, the ebpf prog is only injected when the trace is started. > > > > > > > > > also seems no > > > > > place > > > > > inside kernel have recorded such information for ebpf to read, correct me > > > > > if my understanding is wrong. > > > > Just record the info by starting the bcc script in case you need that, is there > > > > anything wrong with this usage? Always doing such stuff in kernel isn't fair for > > > > users which don't care or need this info. > > > That is why we add a Kconfig option and set it to N by default. And I > > > suppose > > > with modern cpu, the cost with several more instructions would not be that > > > expensive even the option is enabled, just my $0.02. > > > > > > > > And as cloud provider,we would like to know data when necessary instead > > > > > of collect data by keep script running because it is expensive than just > > > > > read > > > > > node IMHO. > > > > It shouldn't be expensive. It might be a bit slow to inject the ebpf prog because > > > > the code has to be verified, however once it is put inside kernel, it should have > > > > been efficient enough. The kernel side prog only updates & stores the latency > > > > summery data into bpf map, and the stored summery data can be read out anytime > > > > by userspace. > > > > > > > > Could you explain a bit why it is expensive? such as biolatency > > > I thought I am compare read a sys node + extra instructions in kernel with > > > launch a specific process for monitoring which need to occupy more > > > resources (memory) and context switch. And for biolatency, it calls the > > > bpf_ktime_get_ns to calculate latency for each IO which I assume the > > > ktime_get_ns will be triggered finally, and it is not cheap as you said. > > You can replace one read of timestamp with rq->start_time_ns too, just > > like what this patch does. You can write your bcc/bfptrace script, > > which is quite easy to start. Once you learn its power, maybe you will love > > it. > > Yes, I definitely need to learn more about it :-). But even with the change, > I still believe read a node is cheaper than a script. > > And seems biolatency can't trace bio based driver per below, and with > collect data in tree we can trace all block drivers. > > # load BPF program > b = BPF(text=bpf_text) > if args.queued: > b.attach_kprobe(event="blk_account_io_start", fn_name="trace_req_start") > else: > b.attach_kprobe(event="blk_start_request", fn_name="trace_req_start") > b.attach_kprobe(event="blk_mq_start_request", fn_name="trace_req_start") > b.attach_kprobe(event="blk_account_io_completion", > fn_name="trace_req_completion") > > Could it possible to extend it support trace both request and bio? Otherwise > we have to run another script to trace md raid. It is pretty easy to extend support bio, just add kprobe on submit_bio and bio_endio(). thanks, Ming