On Tue, Apr 25, 2017 at 01:10:34PM +0100, Tvrtko Ursulin wrote: > > On 25/04/2017 12:35, Chris Wilson wrote: > >On Tue, Apr 25, 2017 at 12:13:04PM +0100, Tvrtko Ursulin wrote: > >>From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> [snip] > >>+static enum intel_engine_id > >>+rt_balance(const struct workload_balancer *balancer, > >>+ struct workload *wrk, struct w_step *w) > >>+{ > >>+ enum intel_engine_id engine; > >>+ long qd[NUM_ENGINES]; > >>+ unsigned int n; > >>+ > >>+ igt_assert(w->engine == VCS); > >>+ > >>+ /* Estimate the "speed" of the most recent batch > >>+ * (finish time - submit time) > >>+ * and use that as an approximate for the total remaining time for > >>+ * all batches on that engine. We try to keep the total remaining > >>+ * balanced between the engines. > >>+ */ > > > >Next steps for this would be to move from an instantaneous speed, to an > >average. I'm thinking something like a exponential decay moving average > >just to make the estimation more robust. > > Do you think it would be OK to merge these two tools at this point > and continue improving them in place? Yes. Although there's no excuse no to make this NO_RELOC from the start, especially if we want to demonstrate how it should be done! Hopefully attached the delta. > Your balancer already looks a solid step up from the queue-depth > one. I checked today myself and, what looks like a worst case of a > VCS1 hog and a balancing workloads running together, it gets the > VCS2 utilisation to impressive 85%. Yup. Just thinking of the danger in using an instantaneous value as our estimate and how to improve. > As mentioned before those stats can now be collected easily with: > > trace.pl --trace gem_wsim ...; perf script | trace.pl > > I need to start pining the relevant people for help with creating > relevant workloads and am also entertaining the idea of trying > balancing via exporting the stats from i915 directly. Just to see if > true vs estimated numbers would make a difference here. Aye. > >>+ if (qd_throttle > 0 && balancer && balancer->get_qd) { > >>+ unsigned int target; > >>+ > >>+ for (target = wrk->nr_steps - 1; target > 0; > >>+ target--) { > > > >I think this should skip other engines. > > > >if (target->engine != engine) > > continue; > > If you say so. I don't have an opinion on it. Would it be useful to > perhaps have both options - to throttle globally and per-engine? I > could easily add two different workload commands for that. I'm leaning towards making it a balancer->callback(). -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx