On Tue, Mar 05, 2019 at 10:59:52AM -0800, Alexei Starovoitov wrote: > On Tue, Feb 26, 2019 at 01:46:01AM -0500, Kris Van Hees wrote: > > On Mon, Feb 25, 2019 at 10:18:25PM -0800, Alexei Starovoitov wrote: > > > On Mon, Feb 25, 2019 at 07:54:13AM -0800, Kris Van Hees wrote: > > > > > > > > The goal is to further extend the BPF_PROG_TYPE_GTRACE implementation to > > > > support what tracers commonly need, and I am also looking at ways to further > > > > extend this model to allow more tracer-specific features as well without the > > > > need for adding a BPF program types for every tracer. > > > > > > It seems by themselves the patches don't provide any new functionality, > > > but instead look like plumbing to call external code. > > > > The patches are definitely not plumbing to call external code, and if I gave > > that impression I apologise. I overlooked the information you quote below on > > allowing new functionality through modules when I wrote the comment above but > > please note that it was a forward-looking comment in terms of what could be > > done - not a reason for the patches that I submitted. > > > > The patches accomplish something that is totally independent from that: they > > make it possible for existing events that execute BPF programs when triggered > > to transfer control to a BPF program with a more rich context. The first > > patch makes such a transfer possible (using tail-call combined with converting > > the context to the new program type), and the second patch provides one such > > program type (generic trace). The new functionality provided by the program > > type is direct access to task information that previously could only be > > obtained through helper calls. E.g. the new program type allows programs to > > access the task state, prio, ppid, euid, and egid. None of those pieces of > > information can currently be obtained unless you start poking around in > > memory using bpf_probe_read() helper calls. > > I don't think I understand the problem you're trying to solve. > >From kprobe/tracepoints/etc bpf prog can use bpf_probe_read() to read everything. > Are you saying direct access to state, prio, ppid, euid, and egid via context > is much superior? Why? Because it's more stable? When you provide tracing to non-privileged users you definitely do not want to allow BPF programs to access any memory they want in kernel space, yet you would still want to be able to provide a decent amount of information about tasks at time of probe firing. > Why stop at these fields then? task_struct has many others. > > What we observed that no matter how many fields we add to stable uapi > somebody will always request one more. For networking the total number of > such fields is contained, but for tracing we're talking about thousands > of useful fields. We cannot make them stable. > Hence we've been working on alternative approach via BTF to make all > of kernel internal fields sort-of stable via 'compile once' technique that > we described at the last LPC. Sure, but the ones I put in there were an example of how this can be used. And again, in the case of unprivileged tracing, this easily becomes an issue about where you end up enforcing what a tracing program can do and cannot do. There will always be cases where more than the 'standard' information is needed for a tracing task, and then it would be quite reasonable to conclude that a higher level of privileges is required to accomplish that - but that shouldn't prevent unprivileged tracing from being able to be useful as well. Again, the limited set of fields I put in there right now is a matter of showing how this can be used. It is certainly meant to be expended quite a bit. The primary reason though behind the context conversion approach and the generic tracing program type and context is that tracing on Linux based on the existing kernel facilities limits the userspace tools because userspace has quite limited control over what happens when a probe/event fires. One of the features of advanced tracing tools has been the ability to have more (safe) control over what happens when the probe/event fires and how data is stored in output buffers. Since the userspace tool is the one requested data and ultimately processes the generated data, it stands to reason that it would benefit from being able to have more freedom in that area. But that means it needs to be able to provide a BPF program of a type that more closely relates to the tracing tool functionality rather than the probe or event itself (especially since probes and events are very specific, and by their very nature should not really care about how userspace uses information). This is again even more true for privileged tracing - right now there is a lot of useful task information that you cannot get to without bpf_probe_read() but unprivileged users really shouldn't be able to just read arbitrary kernel memory. So in summary, I am trying to solve two (related) problems: - Ensure that unprivileged tracing can obtain information about the task that triggered a probe or event. There will always be limitations but we can do better than is available now. - Allow tracing tools ab ability to provide actions to be performed when a probe or event fires, beyond what the individual BPF program types allow for the specific probe/event types (and do it in a generic manner, in a sense encapsulating multiple probe/event types in a more generic tracing context). A patch I am currently working on ties into this (and I hope to get it ready sometime next week). It builds on the support you already have for accessing packet data from the __sk_buff context. If we can make this same functionality available to other contexts as well, my goal would be to make it possible for the generic tracing context to have a buffer (data and data_end members) that the BPF program can issue direct stores to as a means to allow a tracing program to control how data is written into the buffer. I am still working out some details but I have a prototype working, and it retains all safety provisions that BPF offres us. But being able to do things like this without needing to touch the context of any other BPF program type is a great benefit to offer tracing tools, as far as I see it. Kris