On Mon, Jul 7, 2014 at 9:43 PM, Daniel Vetter <daniel@xxxxxxxx> wrote:
On Tue, Jul 01, 2014 at 08:54:27PM +0100, Chris Wilson wrote:
> On Tue, Jul 01, 2014 at 05:16:30PM +0000, Mateo Lozano, Oscar wrote:Short-circuiting the entire discussion here. Afaik there's two OA modes:
> > > The issue is they need:
> > >
> > > A) A buffer object.
> > > B) Bound to GGTT.
> > > C) That userspace knows the GGTT offset of, so that they can program
> > > OABUFFER with it.
> > > D) That userspace can map so that they can read the reported counters.
> > >
> > > They used to create a bo, call bo_pin on it, use args->offset to program
> > > OABUFFER (via MI_LOAD_REGISTER_IMM, I imagine), map it and read the
> > > counter values. They cannot do this anymore.
> >
> > The answer might be that all of this needs to be done by the kernel
> > itself, but then we need to provide an interface to userspace...
>
> Yes. If you need to pin a buffer for a register, then it needs to be
> handled by the kernel. Especially one that provides information about
> other users.
- inline with the batch with MI_REPORT_PERF
- global with the ringbuffer setup with the OABUFFER registers
There's one other; we can read the counters via mmio for HSW+ too. I've found that quite convenient for experimenting with capturing OA counters from userspace. A notable disadvantage with reading via mmio though is that there's no latch and hold mechanism to make sure the counters are frozen while they are read back-to-back.
The later should indeed be fully controlled by the kernel as Chris
suggested and exposed as an off-cpu performance monitoring unit through
the perf subsystem. Chris has rfc patches floating somewhere to do this
for other gpu perf data.
Just to let folks know; I've recently been starting to play around with Chris' perf patch plus a couple of small patches on top and was planning on experimenting with exposing the oa counters this way soon.
I was hoping to try and pick up the discussion of this patch soon too and give some comments, but I'd also like to collect some concrete data as a reference point first, to be more confident in my own understanding of how things are behaving.
I was hoping to try and pick up the discussion of this patch soon too and give some comments, but I'd also like to collect some concrete data as a reference point first, to be more confident in my own understanding of how things are behaving.
One fun thing here is the coordination between these two OA modes since
iirc they both use the same setup registers for the performance counter
configuration. No idea yet how to solve this.
But really userspace shouldn't program ggtt offset, not even
debug/performance measuring tools.
I just wanted to pop my head up here just so others are aware that I'm another person looking at this area, aiming to understand how best to make this data available to both GL and tools; initially considering Mesa's performance query extensions (that don't always report reliable data currently) and tools like intel-gpu-top.
--
Regards,
Robert
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx