Quoting Michael Sartain (2018-12-20 20:27:19) > On Wed, Dec 19, 2018, at 12:22 PM, Steven Rostedt wrote: > > On Wed, 19 Dec 2018 12:08:18 +0200 > > Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> wrote: > >· > > > To me, it seems almost as if folks are too preoccupied with thinking if > > > we somehow can do this through tracepoints, to stop and actually think > > > if we should. > >· > > Regardless of whether it should or shouldn't, one solution to this is > > to make the trace event in question record basically nothing but a > > pointer. > > Right now, these are the events I'm capturing w/ an AMD gpu: > > amdgpu_cs:0-1150 [002] 630662.649417: amdgpu_cs_ioctl: sched_job=3490671, timeline=gfx, context=105, seqno=3081096, ring_name=ffff91cb1ab1bdd0, num_ibs=3 > gfx-190 [000] 630662.649451: amdgpu_sched_run_job: sched_job=3490671, timeline=gfx, context=105, seqno=3081096, ring_name=ffff91cb1ab1bdd0, num_ibs=3 > gfx-190 [000] 630662.649454: dma_fence_signaled: driver=amd_sched timeline=gfx context=104 seqno=3081096 > > With Intel gpu (and rebuilt kernel w/ CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS): > > <idle>-0 [002] 821.717208: intel_engine_notify: dev=0, engine=0:0, seqno=38896, waiters=1 > RenderThread-1024 [002] 825.722358: i915_request_queue: dev=0, engine=0:0, hw_id=9, ctx=30, seqno=38896, flags=0x0 > RenderThread-1024 [002] 825.722371: i915_request_add: dev=0, engine=0:0, hw_id=9, ctx=30, seqno=38896, global=0 > RenderThread-1024 [002] 825.722372: i915_request_submit: dev=0, engine=0:0, hw_id=9, ctx=30, seqno=38896, global=0 > RenderThread-1024 [002] 825.745964: i915_request_execute: dev=0, engine=0:0, hw_id=9, ctx=30, seqno=38896, global=42199 > RenderThread-1024 [002] 825.745964: i915_request_in: dev=0, engine=0:0, hw_id=9, ctx=30, seqno=38896, prio=0, global=42199, port=1 > <idle>-0 [002] 825.755943: intel_engine_notify: dev=0, engine=0:0, seqno=42199, waiters=1 > > It's quite obvious that just because gpuvis sees those amdgpu tracepoints when > running with the AMD card and parses and displays those, it does *not* get > those same tracepoints when I run with an Intel gpu. > > And with my Iris Pro Graphics 580 Gen9, I can reasonably expect to get the > above i915 tracepoints. > > But if I install a new Intel Xe Gen11, why should I expect to see those Gen9 > i915 tracepoint events? Is it because we are tying tracepoints and the > created uABI to *kernel modules* and not the hardware? You're on the correct track here. The issue is that even for Gen9, we foresee to unnecessary maintenance burden to keep the tracepoints, as we work on the scheduler or them disappearing from kernel scope when HW-assisted scheduling is enabled. And the lack of versioning on tracepoints does not make it easy on userspace to do graceful degradiation or to detect the underneath platform. Stuffing the versioning info to every tracepoint, to make sure it's in the captured ring buffer being inspected, is not too elegant, so some auxilary interface would still have to be probed. > I'm asking, because personally I would expect the hardware to drive these > tracepoint events, much like I check cpu flags to see whether I can run AVX > code, or perf has intel_pt recording on one machine, but not another. If we wanted to make sure we can keep them stable within a gen, we would have to move them closer to the point we talk to hardware and would basically just emit information that a) came from userspace (which is stable due to uABI) or b) is going to hardware (we don't expect the underlying hardware magically change). > Right now gpuvis graphs the above events in an easy to understand view. > Occasionally, it's really nice to use trace-cmd to get textual representation > for grepping, etc. Storing pointers would obviously break that. Steve's idea kind of solves that. There would be an auxilary module build out-of-tree (say, from gpuvis), that would emit a new tracepoint "b" with more information on triggering tracepoint "a". So basically you would stop looking for tracepoint "a", and load your module and just look for "b". It's bit on the grey area when it comes to breaking userspace and the philosophical question is, is it us breaking userspace or userspace setting itself a trap. But I guess it might be OK, if the distros knowingly bundle such out-of-tree module (which is not subject to kernel stability). > I guess if it's > what we need to do to avoid the uABI problem, then it's what we do - still > better than using an entirely new tracing system if we can avoid that. The bigger problem that I'd still like to hear some ideas for is before drawing conclusion is about elegantly sourcing the tracepoints from hardware events. Trying to do live conversion from hardware generated ring buffer during execution just to make sure it interleaves with the software generated ring buffer and works under same trigger, sounds not so performant. Usefulness of HW related "special" tracepoints without gpuvis doing the time sorting based on parameters, not timestamp, could be too low to be used with the general tooling like you mentioned. Then we have to think about how much effort is it worth put into solving the HW to SW tracepoint injection if we could with less total effort have a secondary interface for hardware events. Anyways, Happy Holidays all! I'll be back after New Year. Regards, Joonas _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx