On Wed, Nov 04, 2015 at 01:20:33PM +0000, Gong, Zhipeng wrote: > > > > -----Original Message----- > > From: Chris Wilson [mailto:chris@xxxxxxxxxxxxxxxxxx] > > Sent: Wednesday, November 04, 2015 5:54 PM > > On Wed, Nov 04, 2015 at 06:19:33AM +0000, Gong, Zhipeng wrote: > > > > From: Chris Wilson [mailto:chris@xxxxxxxxxxxxxxxxxx] On Tue, Nov 03, > > > > 2015 at 01:31:22PM +0000, Gong, Zhipeng wrote: > > > > > > > > > > > From: Chris Wilson [mailto:chris@xxxxxxxxxxxxxxxxxx] > > > > > > > > > > > > Do you also have a relative perf statistics like op/s we can > > > > > > compare to make sure we aren't just stalling the whole system? > > > > > > > > > > > Could you please provide the commands about how to check it? > > > > > > > > I was presuming your workload has some measure of > > efficiency/throughput? > > > > It is one thing to say we are using 10% less CPU (per second), but > > > > the task is running 2x as long! > > > We use execute time as a measurement, the patch affects the execution > > > time for our cases slightly. > > > > > > Exec time(s) | w/o patch | w/patch > > > ----------------------------------------------- > > > BDW async 1 | 65.00 | 65.25 > > > BDW async 5 | 68.50 | 66.42 > > > > That's reassuring. > > > > > > > > > > > > How much cpu time is left in the i915_wait_request branch? i.e. > > > > > > how close to the limit are we with chasing this path? > > > > > Could you please provide the commands here either? :) > > > > > > > > Check the perf callgraph. > > > > > > Now the most of time is in io_schedule_timeout __i915_wait_request > > > |--64.04%-- io_schedule_timeout > > > |--22.04%-- intel_engine_add_wakeup > > > |--3.13%-- prepare_to_wait > > > |--2.99%-- gen6_rps_boost > > > |-... > > > > No more busywaits, and most of the time is spent kicking the next process or > > doing the insertion sort into the waiting rbtree. > > > > What's the ratio now of __i915_wait_request to the next hot function? > > And who are the chief callers of __i915_wait_request? > > -Chris > Please check the attachments for the details, I post a piece of it here: > |--17.89%-- i915_gem_object_sync > |--73.19%-- __i915_wait_request > |--12.60%-- i915_gem_object_retire_request Interesting. Most of the time is spent shuffling requests around in the execbuffer rather than doing useful work. I've been working on moving that work around, but even then we are likely to be spending our time instantiating all those new objects. As far as trimming the CPU time from __i915_wait_request() that looks about as far as we can go. If you have some free cycles on those machines, I would very much appreciate seeing the same callgraphs from a http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=nightly&id=134211e33719ef698f9bd51b72ad2fc434cb51f9 kernel Thanks, -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx