Re: [RFC 0/3] Engine utilization tracking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 5/10/2017 12:45 PM, Daniel Vetter wrote:
On Wed, May 10, 2017 at 10:38 AM, Tvrtko Ursulin
<tvrtko.ursulin@xxxxxxxxxxxxxxx> wrote:
On 09/05/2017 19:11, Dmitry Rogozhkin wrote:
On 5/9/2017 8:51 AM, Tvrtko Ursulin wrote:
On 09/05/2017 16:29, Chris Wilson wrote:
On Tue, May 09, 2017 at 04:16:41PM +0100, Tvrtko Ursulin wrote:

On 09/05/2017 15:26, Chris Wilson wrote:
On Tue, May 09, 2017 at 03:09:33PM +0100, Tvrtko Ursulin wrote:
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>

By popular customer demand here is the prototype for cheap engine
utilization
tracking.

customer and debugfs?

Well I did write in one of the following paragraphs on this topic.
Perhaps I should have put it in procfs. :) Sysfs API looks
restrictive or perhaps I missed a way to get low level (fops) access
to it.

It uses static branches so in the default off case it really
should be cheap.

Not as cheap (for the off case) as simply sampling RING_HEAD/RING_TAIL

Off case are three no-op instructions in three places in the irq
tasklet. And a little bit of object size growth, if you worry about
that aspect?

It's just how the snowball begins.

We should be able to control it. We also have to consider which one is
lighter for this particular use case.

which looks to be the same level of detail. I wrapped all this up in a
perf interface once up a time...

How does that work? Via periodic sampling? Accuracy sounds like it
would be proportionate to the sampling frequency, no?

Right, and the sampling frequency is under user control (via perf) with
a default of around 1000, gives a small systematic error when dealing
with %

I included power, interrupts, rc6, frequency (and the statistics but I
never used those and dropped them once oa landed), as well as
utilisation, just for the convenience of having sane interface :)

Can you resurrect those patches? Don't have to rebase and all but I
would like to see them at least.
Mind that the idea behind the requested kind of stats is primary usage
by the customers in the _product_ environment to track GPU occupancy and
predict based on this stats whether they can execute something else.
Which means that 1) debugfs and any kind of debug-like infrastructure is

Yeah I acknowledged in the cover letter debugfs is not ideal.

I could implement it in sysfs I suppose by doing time based transitions as
opposed to having explicit open/release hooks. It wouldn't make a
fundamental different to this RFC from the overhead point of view.

But most importantly we need to see in detail how does Chris' perf based
idea looks like and does it fit your requirements.
+1 on perf pmu, that sounds much more like the userspace interface
you're looking for. If it's not that, then perhaps hand-rolled like
the i915 OA stuff we now have (but starting out with a perf pmu sounds
much better, at least for anything global which doesn't need to be
per-context or per-batch).
-Daniel
You know, thinking once more time which interface I would like to see as a user, I would say the following. As a user I expect to have easy access to the basic GPU information and current characteristics. This information includes: 1. GPU frequency characteristics including: current running frequency, min/max SW limits, min/max HW limits, boost frequency settings (if any), driver power/performance preset (if any) 2. Basic information of GPU high level structure, I specifically mean engines capable to work in parallel: number of VDBOX engines, number of VEBOX engines, etc. 3. High level metric to understand how GPU was busy over time: each engine busy clocks I would assume that there will be users who will simply log to the system and want to quickly get the above info with the cat /sysfs file. I would assume that some programmatic usages are possible to parse sysfs and take certain actions if, for example, current GPU support single VDBOX only (for example, run some operation as SW decoding, rather than HW). So, I would suggest to have /sysfs files for the information above.

Perf subsystem indeed looks attractive to expose such a metrics, but I think we need to target lower level metrics with Perf. From my perspective right now i915 misses exposure of certain key information which is natively expected by any user and developer. Using perf to expose it will force users to use special tools or write own programs to query them - this will simply reduce usability. After all, why you expose /sys/class/drm/card0/power/rc6_residency_ms and you do not expose how much time GPGPU or VDBOX did its job?! Honestly, RC6 is a second level of details for the significant part of the users and customers.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux