Re: [RFC PATCH v1 1/1] tracing/kprobe: Add multi-probe support for 'perf_kprobe' PMU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi.

Le mercredi 23 août 2023, 02:36:14 CEST Masami Hiramatsu a écrit :
> Hi,
> 
> On Mon, 21 Aug 2023 14:55:14 +0200
> 
> Francis Laniel <flaniel@xxxxxxxxxxxxxxxxxxx> wrote:
> > > Could you tell me how do you use this feature, for what perpose?
> > 
> > Sure (I think I detailed this in the cover letter but I only sent it to
> > the
> > "main" mailing list and not the tracing one, sorry for this
> > inconvenience)!
> > 
> > Basically, I was adding NTFS tracing to an existing tool which monitors
> > slow I/Os using BPF [1].
> > To test the tool, I compiled a kernel with both NTFS module built-in and
> > figured out the write operations when done on ntfs3 were missing from the
> > output of the tool.
> > The problem comes from the library I use in the tool which does not handle
> > well when it exists different symbols with the same name.
> > Contrary to perf, which only handles kprobes through sysfs, the library
> > handles it in both way (sysfs and PMU) with a preference for PMU when
> > available [2].
> > 
> > After some discussion, I thought there could be a way to handle this
> > automatically in the kernel when using PMU kprobes, hence this patch.
> > I totally understand the case I described above is really a corner one,
> > but I thought this feature could be useful for other people.
> > In the case of the library itself, we could indeed find the address in
> > /proc/ kallsyms but it would mean having CAP_SYS_ADMIN which is not
> > forcefully something we want to enforce.
> > Also, if we need to read /boot/vmlinuz or /boot/System.map we also need to
> > be root as these files often belong to root and cannot be read by others.
> > So, this patch solves the above problem while not needing specific
> > capabilities as the kernel will solve it for us.
> 
> Thanks for the explanation. I got the background, and still have some
> questions.
> 
> - Is the analysis tool really necessary to be used by users other than
>   CAP_SYS_ADMIN? Even if it is useful, I still doubt CAP_PERFMON is safer
>   than CAP_SYS_ADMIN, because BPF program can access any kernel register.

For the tool itself, this is indeed not a problem as we rely on CAP_SYS_ADMIN.
But this one for the library, as they do not want to enforce CAP_SYS_ADMIN to 
use the library.

> - My concern about this solution (enabling kprobe PMU on all symbols which
>   have the same name) makes it hard to run the same BPF program on it.
>   This is because symbols with the same name do not necessarily have the
>   same arguments (theoretically). Also, the BPF will trace unwanted symbols
>   at unwanted timing.

Good point for the same name but different arguments!
I was too focused on my case (ntfs_file_write_iter()) and forgot about this.

> - Can you expand that library to handle the same name symbols differently?
>   I think this should be done in the user space, or in the kallsyms like
>   storing symbols with source line information.

I think we can find a way to handle this in user space by potentially 
abstracting the several PMU probe under one.
Or we can simply explode if a name correspond to several symbols and ask the 
user to use addr + offset to precise the symbol in this case.

> I understand this demand, but solving that with probing *all* symbols seems
> like a brute force solution and may cause another problem later.
> 
> But this is a good discussion item. Last month Alessandro sent a script
> which makes such symbols unique. Current problem is that the numbering is
> not enough to identify which one is from which source code.

Definitely, I wrote this specifically to create a discussion and gather some 
comments, hence the RFC tag.

> https://lore.kernel.org/all/20230714150326.1152359-1-alessandro.carminati@gm
> ail.com/

I will definitely take a look at this contribution! Thank you for sharing the 
link!

> > > If you just need to trace/profile a specific function which has the same
> > > name symbols, you might be better to use `perf probe` +
> > > `/sys/kernel/tracing` or `perf record -e EVENT`.
> > > 
> > > Or if you need to run it with CAP_PERFMON, without CAP_SYS_ADMIN,
> > > we need to change a userspace tool to find the correct address and
> > > pass it to the perf_event_open().
> > > 
> > > > > > Added new events:
> > > > > >   probe:ntfs_file_write_iter (on ntfs_file_write_iter)
> > > > > >   probe:ntfs_file_write_iter (on ntfs_file_write_iter)
> > > > > > 
> > > > > > You can now use it in all perf tools, such as:
> > > > > >         perf record -e probe:ntfs_file_write_iter -aR sleep 1
> > > > > > 
> > > > > > root@vm-amd64:~# cat /sys/kernel/tracing/kprobe_events
> > > > > > p:probe/ntfs_file_write_iter _text+5088544
> > > > > > p:probe/ntfs_file_write_iter _text+5278560
> > > > > > 
> > > > > > > Thought?
> > > > > > 
> > > > > > This contribution is basically here to sort of mimic what perf
> > > > > > does
> > > > > > but
> > > > > > with PMU kprobes, as this is not possible to write in a sysfs file
> > > > > > with
> > > > > > this type of probe.
> > > > > 
> > > > > OK, I see it is for BPF only. Maybe BPF program can filter correct
> > > > > one
> > > > > to access the argument etc.
> > > > 
> > > > I am not sure I understand, can you please precise?
> > > > The eBPF program will be run when the kprobe will be triggered, so if
> > > > the
> > > > kprobe is armed for the function (e.g. old ntfs_file_write_iter()),
> > > > the
> > > > eBPF program will never be called.
> > > 
> > > As I said above, it is userspace BPF loader issue, because it has to
> > > specify the correct address via unique symbol + offset, instead of
> > > attaching all of them. I think that will be more side-effects.
> > > 
> > > But anyway, thanks for pointing this issue. I should fix kprobe event to
> > > reject the symbols which is not unique. That should be pointed by other
> > > unique symbols.
> > 
> > You are welcome and I thank you for the discussion.
> > Can you please precise more what you think about "reject the symbols which
> > is not unique"?
> 
> > Basically something like this:
> Yes, that's what I said.

OK, I will write something and send it as an RFC before end of the week then.

> > struct trace_event_call *
> > create_local_trace_kprobe(char *func, void *addr, unsigned long offs,
> > 
> > 			  bool is_return)
> > 
> > {
> > 
> > 	...
> > 	if (!addr && func) {
> 
> if (func) {  /* because anyway if user specify "func" we have to solve
>  the symbol address */
> 
> > 		array.addrs = NULL;
> > 		array.size = 0;
> > 		ret = kallsyms_on_each_match_symbol(add_addr, func, &array);
> > 		if (ret)
> > 		
> > 			goto error_free;
> > 		
> > 		if (array.size != 1) {
> > 		
> > 			/*
> > 			
> > 			 * Function name corresponding to several symbols must
> > 			 * be passed by address only.
> > 			 */
> > 			
> > 			return -EINVAL;
> 
> This case may return a unique error code so that the caller can notice
> the problem.

Is it OK to add a dedicated error code for such a case?

> Thank you,
> 
> > 		}
> > 	
> > 	}
> > 	
> > 	
> > 	
> > 	...
> > 
> > }
> > ?
> > If my understanding is correct, I think I can write a patch to achieve
> > this.
> > 
> > 
> > 
> > Best regards.
> > ---
> > [1]: https://github.com/inspektor-gadget/inspektor-gadget/pull/1879
> > [2]: https://github.com/cilium/ebpf/blob/
> > 270c859894bd38cdd0c7783317b16343409e4df8/link/kprobe.go#L165-L191

Best regards.







[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux