On Fri, Feb 12, 2016 at 12:00:40PM +0000, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > Assorted changes most likely without any practical effect > apart from a tiny reduction in generated code for the interrupt > handler and request submission. > > * Remove needless initialization. > * Improve cache locality by reorganizing code and/or using > branch hints to keep unexpected or error conditions out > of line. > * Favor busy submit path vs. empty queue. > * Less branching in hot-paths. > > v2: > > * Avoid mmio reads when possible. (Chris Wilson) > * Use natural integer size for csb indices. > * Remove useless return value from execlists_update_context. > * Extract 32-bit ppgtt PDPs update so it is out of line and > shared with two callers. > * Grab forcewake across all mmio operations to ease the > load on uncore lock and use chepear mmio ops. > > Version 2 now makes the irq handling code path ~20% smaller on > 48-bit PPGTT hardware, and a little bit less elsewhere. Hot > paths are mostly in-line now and hammering on the uncore > spinlock is greatly reduced together with mmio traffic to an > extent. Did you notice that ring->next_context_status_buffer is redundant as we also have that information to hand in status_pointer? What's your thinking for if (req->elsp_submitted & ring->gen8_9) vs a plain if (req->elsp_submitted) ? The tidies look good. Be useful to double check whether gem_latency is behaving as a canary, it's a bit of a puzzle why that first dispatch latency would grow. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx