I spent yonks trying to define tests that produce reliable results for demonstrating the impact of the cmdparser, that don't require inspection of a perf profile. So far, with any reliability (because gen7 thermal throttling makes life difficult) I can demonstrate the impact of using vmap + WC. Improving the hash function still relies on inspecting the perf profile of real applications (i.e. games) where the easiest metrics to gather such as frame times are dominated by the render time. Nor do I have a metric that is sensitive to timing, such as the bug reported in "libva decoding performance regression with kernel 4.0-rc" 1428627643.3417.22.camel@xxxxxxxxxxxxx What I can demonstrate is that eliminating the vmap overhead affects throughput by about 2x on small batches, and using WC on byt further improves throughput by about 30%. And from that bug report thread, applying the patches prevented the missed deadlines. Despite all of this the cmdparser still imposes severe overhead (e.g. throughput reduction of 2x on batches). -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx