On 25/06/2018 21:02, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2018-06-25 18:25:46)
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
This Kconfig option was added to protect the implementation specific
internals from user expectations but so far it was mostly hassle.
Remove it so it is possible to debug request submission on any kernel
anywhere.
Our job is not to let bugs into the wild ;)
I did not word that well - I actually meant debugging the engine
timelines for unexpected stalls and/or dependencies. So more about
userspace being able to analyse what's happening.
This adds around 4k to default i915.ko build but should have no
performance effects due inactive tracepoints being no-op-ed out and out-
of-line.
Users should remember tracepoints which are close to low level i915
implementation details are subject to change and cannot be guaranteed.
That's the caveat that I feel needs fleshed out. Burying it had the
advantage of making it quite clear that you had to opt in and pick up
the pieces when it inevitably breaks.
What is wanted and what can we reasonable provide? If the tracepoints
needs to undergo major change before the next LTS, let alone for the
life of that LTS...
If we know what is wanted can we define that better in terms of
dma_fence and leave lowlevel for debugging (or think of how we achieve
the same with generic bpf? kprobes)? Hmm, I wonder how far we can push
that.
What is wanted is for instance take trace.pl on any kernel anywhere and
it is able to deduce/draw the exact metrics/timeline of command
submission for an workload.
At the moment it without low level tracepoints, and without the
intel_engine_notify tweak, it is workload dependent on how close it
could get.
So a set of tracepoints to allow drawing the timeline:
1. request_queue (or _add)
2. request_submit
3. intel_engine_notify
4. request_in/out
With this set the above is possible and we don't need a lot of work to
get there.
And with the Virtual Engine it will become more interesting to have
this. So if we had a bug report saying load balancing is not working
well, we could just say "please run it via trace.pl --trace and attach
perf script output". That way we could easily see whether or not is is a
problem in userspace behaviour or else.
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx