On Thu, Apr 14, 2016 at 04:07:58PM +0100, Dave Gordon wrote: > On 14/04/16 12:58, Tvrtko Ursulin wrote: > > > >On 14/04/16 12:30, Chris Wilson wrote: > >>On Thu, Apr 14, 2016 at 12:24:20PM +0100, Tvrtko Ursulin wrote: > >>> > >>>On 14/04/16 12:16, Chris Wilson wrote: > >>>>On Thu, Apr 14, 2016 at 11:59:29AM +0100, Tvrtko Ursulin wrote: > >>>>>From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >>>>> > >>>>>We know ringbuffers are memory and not ports so if we use readl > >>>>>and writel instead of ioread32 and iowrite32 (which dispatch to > >>>>>the very same functions after checking the address range) we > >>>>>avoid generating functions calls and branching on every access. > >>>> > >>>>We don't need to use readl/write at all, since they are normal memory > >>>>on llc, and on x86 we can pretend that iomaps (!llc/stolen) are as > >>>>well. > >>> > >>>It is fine to use readl/writel since it translates to a single mov > >>>instruction anyway on x86. > >>> > >>>>This patch is in the queue along with killing the incorrect spare iomem > >>>>annotation. > >>> > >>>Ok did not spot them. Don't mind either way, thought this is quick, > >>>easy and obvious improvement when I spotted the ugly code generated > >>>for ring buffer writing. > >>> > >>>Mind you it is still not completely pretty with this patch since it > >>>is full of reloads and adds for ringbuf->virtual_start and tail > >>>which I can't figure how to help GCC optimize. Unless we make being, > >>>emit and advance functions return the current tail pointer and also > >>>accept it. In that case it all shrinks by half. > >> > >>We figured out how to help gcc with that in userspace using: > >> > >>out = ring_begin(num_dwords); > >>out[0] = cmd; > >>out[N] = dwN > >> > >>GCC will then do > >> > >>mov $imm0, 0x0($eax) > >>mov $imm1, 0x4($eax) > >>mov $edx, 0x8($eax) > >>etc > > > >Would be nice, hope it happens soon. :) > > > >Regards, > >Tvrtko > > Another couple of alternative styles: > > DWORD* ptr = ring_begin(ring, nwords); > *ptr++ = MI_WHATEVER; > *ptr++ = param1; GCC turns *ptr++ into mov $val, (%eax); add $4, %eax It's convenient for translating the for loops, but not as compact. > ... > ring_advance(ring, ptr); > // this call checks that 'ptr' has not gone > // beyond the nwords reserved above > > Or collapse it all into one call: > > DWORD insns[OP_NWORDS] = { > MI_WHATEVER, > param1, > ... > } > ring_append(ring, nwords, insns); > > which combines the check-and-wrap with a block copy to add all the > instructions in one go :) Considered that, but not seriously looked into what gcc does there - mainly because that would involve even more complicated changes to the code. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx