On Thu, May 23, 2019 at 09:57:37PM -0400, Steven Rostedt wrote: > On Thu, 23 May 2019 17:31:50 -0700 > Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > Now from what I'm reading, it seams that the Dtrace layer may be > > > abstracting out fields from the kernel. This is actually something I > > > have been thinking about to solve the "tracepoint abi" issue. There's > > > usually basic ideas that happen. An interrupt goes off, there's a > > > handler, etc. We could abstract that out that we trace when an > > > interrupt goes off and the handler happens, and record the vector > > > number, and/or what device it was for. We have tracepoints in the > > > kernel that do this, but they do depend a bit on the implementation. > > > Now, if we could get a layer that abstracts this information away from > > > the implementation, then I think that's a *good* thing. > > > > I don't like this deferred irq idea at all. > > What do you mean deferred? that's how I interpreted your proposal: "interrupt goes off and the handler happens, and record the vector number" It's not a good thing to tell about irq later. Just like saying lets record perf counter event and report it later. > > Abstracting details from the users is _never_ a good idea. > > Really? Most everything we do is to abstract details from the user. The > key is to make the abstraction more meaningful than the raw data. > > > A ton of people use bcc scripts and bpftrace because they want those details. > > They need to know what kernel is doing to make better decisions. > > Delaying irq record is the opposite. > > I never said anything about delaying the record. Just getting the > information that is needed. > > > > > > > I wish that was totally true, but tracepoints *can* be an abi. I had > > > code reverted because powertop required one to be a specific > > > format. To this day, the wakeup event has a "success" field that > > > writes in a hardcoded "1", because there's tools that depend on it, > > > and they only work if there's a success field and the value is 1. > > > > I really think that you should put powertop nightmares to rest. > > That was long ago. The kernel is different now. > > Is it? > > > Linus made it clear several times that it is ok to change _all_ > > tracepoints. Period. Some maintainers somehow still don't believe > > that they can do it. > > From what I remember him saying several times, is that you can change > all tracepoints, but if it breaks a tool that is useful, then that > change will get reverted. He will allow you to go and fix that tool and > bring back the change (which was the solution to powertop). my interpretation is different. We changed tracepoints. It broke scripts. People changed scripts. > > > > > Some tracepoints are used more than others and more people will > > complain: "ohh I need to change my script" when that tracepoint > > changes. But the kernel development is not going to be hampered by a > > tracepoint. No matter how widespread its usage in scripts. > > That's because we'll treat bpf (and Dtrace) scripts like modules (no > abi), at least we better. But if there's a tool that doesn't use the > script and reads the tracepoint directly via perf, then that's a > different story. absolutely not. tracepoint is a tracepoint. It can change regardless of what and how is using it.