Hi. Le mercredi 23 août 2023, 02:36:14 CEST Masami Hiramatsu a écrit : > Hi, > > On Mon, 21 Aug 2023 14:55:14 +0200 > > Francis Laniel <flaniel@xxxxxxxxxxxxxxxxxxx> wrote: > > > Could you tell me how do you use this feature, for what perpose? > > > > Sure (I think I detailed this in the cover letter but I only sent it to > > the > > "main" mailing list and not the tracing one, sorry for this > > inconvenience)! > > > > Basically, I was adding NTFS tracing to an existing tool which monitors > > slow I/Os using BPF [1]. > > To test the tool, I compiled a kernel with both NTFS module built-in and > > figured out the write operations when done on ntfs3 were missing from the > > output of the tool. > > The problem comes from the library I use in the tool which does not handle > > well when it exists different symbols with the same name. > > Contrary to perf, which only handles kprobes through sysfs, the library > > handles it in both way (sysfs and PMU) with a preference for PMU when > > available [2]. > > > > After some discussion, I thought there could be a way to handle this > > automatically in the kernel when using PMU kprobes, hence this patch. > > I totally understand the case I described above is really a corner one, > > but I thought this feature could be useful for other people. > > In the case of the library itself, we could indeed find the address in > > /proc/ kallsyms but it would mean having CAP_SYS_ADMIN which is not > > forcefully something we want to enforce. > > Also, if we need to read /boot/vmlinuz or /boot/System.map we also need to > > be root as these files often belong to root and cannot be read by others. > > So, this patch solves the above problem while not needing specific > > capabilities as the kernel will solve it for us. > > Thanks for the explanation. I got the background, and still have some > questions. > > - Is the analysis tool really necessary to be used by users other than > CAP_SYS_ADMIN? Even if it is useful, I still doubt CAP_PERFMON is safer > than CAP_SYS_ADMIN, because BPF program can access any kernel register. For the tool itself, this is indeed not a problem as we rely on CAP_SYS_ADMIN. But this one for the library, as they do not want to enforce CAP_SYS_ADMIN to use the library. > - My concern about this solution (enabling kprobe PMU on all symbols which > have the same name) makes it hard to run the same BPF program on it. > This is because symbols with the same name do not necessarily have the > same arguments (theoretically). Also, the BPF will trace unwanted symbols > at unwanted timing. Good point for the same name but different arguments! I was too focused on my case (ntfs_file_write_iter()) and forgot about this. > - Can you expand that library to handle the same name symbols differently? > I think this should be done in the user space, or in the kallsyms like > storing symbols with source line information. I think we can find a way to handle this in user space by potentially abstracting the several PMU probe under one. Or we can simply explode if a name correspond to several symbols and ask the user to use addr + offset to precise the symbol in this case. > I understand this demand, but solving that with probing *all* symbols seems > like a brute force solution and may cause another problem later. > > But this is a good discussion item. Last month Alessandro sent a script > which makes such symbols unique. Current problem is that the numbering is > not enough to identify which one is from which source code. Definitely, I wrote this specifically to create a discussion and gather some comments, hence the RFC tag. > https://lore.kernel.org/all/20230714150326.1152359-1-alessandro.carminati@gm > ail.com/ I will definitely take a look at this contribution! Thank you for sharing the link! > > > If you just need to trace/profile a specific function which has the same > > > name symbols, you might be better to use `perf probe` + > > > `/sys/kernel/tracing` or `perf record -e EVENT`. > > > > > > Or if you need to run it with CAP_PERFMON, without CAP_SYS_ADMIN, > > > we need to change a userspace tool to find the correct address and > > > pass it to the perf_event_open(). > > > > > > > > > Added new events: > > > > > > probe:ntfs_file_write_iter (on ntfs_file_write_iter) > > > > > > probe:ntfs_file_write_iter (on ntfs_file_write_iter) > > > > > > > > > > > > You can now use it in all perf tools, such as: > > > > > > perf record -e probe:ntfs_file_write_iter -aR sleep 1 > > > > > > > > > > > > root@vm-amd64:~# cat /sys/kernel/tracing/kprobe_events > > > > > > p:probe/ntfs_file_write_iter _text+5088544 > > > > > > p:probe/ntfs_file_write_iter _text+5278560 > > > > > > > > > > > > > Thought? > > > > > > > > > > > > This contribution is basically here to sort of mimic what perf > > > > > > does > > > > > > but > > > > > > with PMU kprobes, as this is not possible to write in a sysfs file > > > > > > with > > > > > > this type of probe. > > > > > > > > > > OK, I see it is for BPF only. Maybe BPF program can filter correct > > > > > one > > > > > to access the argument etc. > > > > > > > > I am not sure I understand, can you please precise? > > > > The eBPF program will be run when the kprobe will be triggered, so if > > > > the > > > > kprobe is armed for the function (e.g. old ntfs_file_write_iter()), > > > > the > > > > eBPF program will never be called. > > > > > > As I said above, it is userspace BPF loader issue, because it has to > > > specify the correct address via unique symbol + offset, instead of > > > attaching all of them. I think that will be more side-effects. > > > > > > But anyway, thanks for pointing this issue. I should fix kprobe event to > > > reject the symbols which is not unique. That should be pointed by other > > > unique symbols. > > > > You are welcome and I thank you for the discussion. > > Can you please precise more what you think about "reject the symbols which > > is not unique"? > > > Basically something like this: > Yes, that's what I said. OK, I will write something and send it as an RFC before end of the week then. > > struct trace_event_call * > > create_local_trace_kprobe(char *func, void *addr, unsigned long offs, > > > > bool is_return) > > > > { > > > > ... > > if (!addr && func) { > > if (func) { /* because anyway if user specify "func" we have to solve > the symbol address */ > > > array.addrs = NULL; > > array.size = 0; > > ret = kallsyms_on_each_match_symbol(add_addr, func, &array); > > if (ret) > > > > goto error_free; > > > > if (array.size != 1) { > > > > /* > > > > * Function name corresponding to several symbols must > > * be passed by address only. > > */ > > > > return -EINVAL; > > This case may return a unique error code so that the caller can notice > the problem. Is it OK to add a dedicated error code for such a case? > Thank you, > > > } > > > > } > > > > > > > > ... > > > > } > > ? > > If my understanding is correct, I think I can write a patch to achieve > > this. > > > > > > > > Best regards. > > --- > > [1]: https://github.com/inspektor-gadget/inspektor-gadget/pull/1879 > > [2]: https://github.com/cilium/ebpf/blob/ > > 270c859894bd38cdd0c7783317b16343409e4df8/link/kprobe.go#L165-L191 Best regards.