On Tue, Jul 05, 2022 at 12:57:17PM +0200, Karolina Drobnik wrote: > From: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > We employ a "waitboost" heuristic to detect when userspace is stalled > waiting for results from earlier execution. Under latency sensitive work > mixed between the gpu/cpu, the GPU is typically under-utilised and so > RPS sees that low utilisation as a reason to downclock the frequency, > causing longer stalls and lower throughput. The user left waiting for > the results is not impressed. > > On applying commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv > workaround") it was observed that deinterlacing h264 on Haswell > performance dropped by 2-5x. The reason being that the natural workload > was not intense enough to trigger RPS (using HW evaluation intervals) to > upclock, and so it was depending on waitboosting for the throughput. > > Commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround") > changes the composition of dma-resv from keeping a single write fence + > multiple read fences, to a single array of multiple write and read > fences (a maximum of one pair of write/read fences per context). The > iteration order was also changed implicitly from all-read fences then > the single write fence, to a mix of write fences followed by read > fences. It is that ordering change that belied the fragility of > waitboosting. > > Currently, a waitboost is inspected at the point of waiting on an > outstanding fence. If the GPU is backlogged such that we haven't yet > stated the request we need to wait on, we force the GPU to upclock until > the completion of that request. By changing the order in which we waited > upon requests, we ended up waiting on those requests in sequence and as > such we saw that each request was already started and so not a suitable > candidate for waitboosting. > > Instead of Okay, all the explanation makes sense. But this commit message and the cover letter tells that we are doing X *Instead* *of* Y. That would mean code for Y would be removed. But this patch just add X. So it looks to me that we are adding extra boosts with the code below. What am I missing? asking whether to boost each fence in turn, we can look at > whether boosting is required for the dma-resv ensemble prior to waiting > on any fence, making the heuristic more robust to the order in which > fences are stored in the dma-resv. > > Reported-by: Thomas Voegtle <tv@xxxxxxxx> > Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6284 > Fixes: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround") > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > Signed-off-by: Karolina Drobnik <karolina.drobnik@xxxxxxxxx> > Tested-by: Thomas Voegtle <tv@xxxxxxxx> > --- > drivers/gpu/drm/i915/gem/i915_gem_wait.c | 35 ++++++++++++++++++++++++ > 1 file changed, 35 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c > index 319936f91ac5..3fbb464746e1 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c > @@ -9,6 +9,7 @@ > #include <linux/jiffies.h> > > #include "gt/intel_engine.h" > +#include "gt/intel_rps.h" > > #include "i915_gem_ioctls.h" > #include "i915_gem_object.h" > @@ -31,6 +32,38 @@ i915_gem_object_wait_fence(struct dma_fence *fence, > timeout); > } > > +static void > +i915_gem_object_boost(struct dma_resv *resv, unsigned int flags) > +{ > + struct dma_resv_iter cursor; > + struct dma_fence *fence; > + > + /* > + * Prescan all fences for potential boosting before we begin waiting. > + * > + * When we wait, we wait on outstanding fences serially. If the > + * dma-resv contains a sequence such as 1:1, 1:2 instead of a reduced > + * form 1:2, then as we look at each wait in turn we see that each > + * request is currently executing and not worthy of boosting. But if > + * we only happen to look at the final fence in the sequence (because > + * of request coalescing or splitting between read/write arrays by > + * the iterator), then we would boost. As such our decision to boost > + * or not is delicately balanced on the order we wait on fences. > + * > + * So instead of looking for boosts sequentially, look for all boosts > + * upfront and then wait on the outstanding fences. > + */ > + > + dma_resv_iter_begin(&cursor, resv, > + dma_resv_usage_rw(flags & I915_WAIT_ALL)); > + dma_resv_for_each_fence_unlocked(&cursor, fence) { > + if (dma_fence_is_i915(fence) && > + !i915_request_started(to_request(fence))) > + intel_rps_boost(to_request(fence)); > + } > + dma_resv_iter_end(&cursor); > +} > + > static long > i915_gem_object_wait_reservation(struct dma_resv *resv, > unsigned int flags, > @@ -40,6 +73,8 @@ i915_gem_object_wait_reservation(struct dma_resv *resv, > struct dma_fence *fence; > long ret = timeout ?: 1; > > + i915_gem_object_boost(resv, flags); > + > dma_resv_iter_begin(&cursor, resv, > dma_resv_usage_rw(flags & I915_WAIT_ALL)); > dma_resv_for_each_fence_unlocked(&cursor, fence) { > -- > 2.25.1 >