Re: [PATCH 1/3] drm/i915: Use readl/writel for ring buffer access

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Thu, 14 Apr 2016 16:55:16 +0100

On Thu, Apr 14, 2016 at 04:07:58PM +0100, Dave Gordon wrote:
> On 14/04/16 12:58, Tvrtko Ursulin wrote:
> >
> >On 14/04/16 12:30, Chris Wilson wrote:
> >>On Thu, Apr 14, 2016 at 12:24:20PM +0100, Tvrtko Ursulin wrote:
> >>>
> >>>On 14/04/16 12:16, Chris Wilson wrote:
> >>>>On Thu, Apr 14, 2016 at 11:59:29AM +0100, Tvrtko Ursulin wrote:
> >>>>>From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
> >>>>>
> >>>>>We know ringbuffers are memory and not ports so if we use readl
> >>>>>and writel instead of ioread32 and iowrite32 (which dispatch to
> >>>>>the very same functions after checking the address range) we
> >>>>>avoid generating functions calls and branching on every access.
> >>>>
> >>>>We don't need to use readl/write at all, since they are normal memory
> >>>>on llc, and on x86 we can pretend that iomaps (!llc/stolen) are as
> >>>>well.
> >>>
> >>>It is fine to use readl/writel since it translates to a single mov
> >>>instruction anyway on x86.
> >>>
> >>>>This patch is in the queue along with killing the incorrect spare iomem
> >>>>annotation.
> >>>
> >>>Ok did not spot them. Don't mind either way, thought this is quick,
> >>>easy and obvious improvement when I spotted the ugly code generated
> >>>for ring buffer writing.
> >>>
> >>>Mind you it is still not completely pretty with this patch since it
> >>>is full of reloads and adds for ringbuf->virtual_start and tail
> >>>which I can't figure how to help GCC optimize. Unless we make being,
> >>>emit and advance functions return the current tail pointer and also
> >>>accept it. In that case it all shrinks by half.
> >>
> >>We figured out how to help gcc with that in userspace using:
> >>
> >>out = ring_begin(num_dwords);
> >>out[0] = cmd;
> >>out[N] = dwN
> >>
> >>GCC will then do
> >>
> >>mov $imm0, 0x0($eax)
> >>mov $imm1, 0x4($eax)
> >>mov $edx, 0x8($eax)
> >>etc
> >
> >Would be nice, hope it happens soon. :)
> >
> >Regards,
> >Tvrtko
> 
> Another couple of alternative styles:
> 
> 	DWORD* ptr = ring_begin(ring, nwords);
> 	*ptr++ = MI_WHATEVER;
> 	*ptr++ = param1;

GCC turns *ptr++ into mov $val, (%eax); add $4, %eax

It's convenient for translating the for loops, but not as compact.

> 	...
> 	ring_advance(ring, ptr);
> 	// this call checks that 'ptr' has not gone
> 	// beyond the nwords reserved above
> 
> Or collapse it all into one call:
> 
> 	DWORD insns[OP_NWORDS] = {
> 		MI_WHATEVER,
> 		param1,
> 		...
> 	}
> 	ring_append(ring, nwords, insns);
> 
> which combines the check-and-wrap with a block copy to add all the
> instructions in one go :)

Considered that, but not seriously looked into what gcc does there -
mainly because that would involve even more complicated changes to the
code.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx