Quoting Tvrtko Ursulin (2019-11-19 16:33:18) > > On 19/11/2019 16:20, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2019-11-19 15:04:46) > >> > >> On 18/11/2019 23:02, Chris Wilson wrote: > >>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > >>> index 33ce258d484f..f7c8fec436a9 100644 > >>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > >>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > >>> @@ -142,6 +142,7 @@ > >>> #include "intel_engine_pm.h" > >>> #include "intel_gt.h" > >>> #include "intel_gt_pm.h" > >>> +#include "intel_gt_requests.h" > >>> #include "intel_lrc_reg.h" > >>> #include "intel_mocs.h" > >>> #include "intel_reset.h" > >>> @@ -2278,6 +2279,18 @@ static void execlists_submission_tasklet(unsigned long data) > >>> if (timeout && preempt_timeout(engine)) > >>> preempt_reset(engine); > >>> } > >>> + > >>> + /* > >>> + * If the GPU is currently idle, retire the outstanding completed > >>> + * requests. This will allow us to enter soft-rc6 as soon as possible, > >>> + * albeit at the cost of running the retire worker much more frequently > >>> + * (over the entire GT not just this engine) and emitting more idle > >>> + * barriers (i.e. kernel context switches unpinning all that went > >>> + * before) which may add some extra latency. > >>> + */ > >>> + if (intel_engine_pm_is_awake(engine) && > >>> + !execlists_active(&engine->execlists)) > >>> + intel_gt_schedule_retire_requests(engine->gt); > >> > >> I am still not a fan of doing this for all platforms. > > > > I understand. I think it makes a fair amount of sense to do early > > retires, and wish to pursue that if I can show there is no harm. > > It's also a bit of a layering problem. Them's fighting words! :) > >> Its not just the cost of retirement but there is > >> intel_engine_flush_submission on all engines in there as well which we > >> cannot avoid triggering from this path. > >> > >> Would it be worth experimenting with additional per-engine retire > >> workers? Most of the code could be shared, just a little bit of > >> specialization to filter on engine. > > > > I haven't sketched out anything more than peeking at the last request on > > the timeline and doing a rq->engine == engine filter. Walking the global > > timeline.active_list in that case is also a nuisance. > > That together with: > > flush_submission(gt, engine ? engine->mask : ALL_ENGINES); > > Might be enough? At least to satisfy my concern. Aye, flushing all other when we know we only care about being idle is definitely a weak point of the current scheme. > Apart layering is still bad.. And I'd still limit it to when RC6 WA is > active unless it can be shown there is no perf/power impact across > GPU/CPU to do this everywhere. Bah, keep tuning until it's a win for everyone! > At which point it becomes easier to just limit it because we have to > have it there. > > I also wonder if the current flush_submission wasn't the reason for > performance regression you were seeing with this? It makes this tasklet > wait for all other engines, if they are busy. But not sure.. perhaps it > is work which would be done anyway. I haven't finished yet; but the baseline took a big nose dive so it might be enough to hide a lot of evil. Too bad I don't have an Icelake with to cross check with an unaffected platform. > > There's definitely scope here for us using some more information from > > process_csb() about which context completed and limit work to that > > timeline. Hmm, something along those lines maybe... > > But you want to retire all timelines which have work on this particular > physical engine. Otherwise it doesn't get parked, no? There I was suggesting being even more proactive, and say keeping an llist of completed timelines. Nothing concrete yet, plenty of existing races found already that need fixing. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx