Hi Tvrtko, > > On 17/03/2022 07:23, Vivek Kasireddy wrote: > > On platforms capable of allowing 8K (7680 x 4320) modes, pinning 2 or > > more framebuffers/scanout buffers results in only one that is mappable/ > > fenceable. Therefore, pageflipping between these 2 FBs where only one > > is mappable/fenceable creates latencies large enough to miss alternate > > vblanks thereby producing less optimal framerate. > > > > This mainly happens because when i915_gem_object_pin_to_display_plane() > > is called to pin one of the FB objs, the associated vma is identified > > as misplaced -- because there is no space for it in the aperture -- > > and therefore i915_vma_unbind() is called which unbinds and evicts it. > > This misplaced vma gets subseqently pinned only when > > i915_gem_object_ggtt_pin_ww() is called without PIN_MAPPABLE. This whole > > thing results in a latency of ~10ms and happens every other repaint cycle. > > Just out of curiosity - have you looked at where does this 10ms come > from? Like is it simply clearing/writing PTEs so expensive, or there is > more to it? Apologies if I asked this before.. [Kasireddy, Vivek] It appears the ~10ms latency seems to come from the execution of gen8_ggtt_clear_range() as seen in the trace log: weston-4124 [008] 161210.397563: funcgraph_entry: | __i915_vma_evict() { weston-4124 [008] 161210.397564: funcgraph_entry: | ggtt_unbind_vma() { weston-4124 [008] 161210.397564: funcgraph_entry: | gen8_ggtt_clear_range() { weston-4124 [008] 161210.408012: funcgraph_exit: # 10416.281 us | } weston-4124 [008] 161210.408012: funcgraph_exit: # 10448.740 us | } weston-4124 [008] 161210.408012: funcgraph_entry: | __vma_put_pages() { weston-4124 [008] 161210.408013: funcgraph_entry: 0.083 us | clear_pages(); weston-4124 [008] 161210.408013: funcgraph_exit: 0.578 us | } weston-4124 [008] 161210.408013: funcgraph_exit: # 10449.622 us | } And, for some reason, I can't get trace-cmd to capture more symbols to gain more insights. However, gen8_ggtt_clear_range() seems to just do this: for (i = 0; i < num_entries; i++) gen8_set_pte(>t_base[i], scratch_pte); where, vma's start = 182190080, length = 132710400, first = 44480, num_entries = 32400 and we have void gen8_set_pte(void __iomem *addr, gen8_pte_t pte) { writeq(pte, addr); } Any idea why executing writeq 32400 times would result in a latency of ~10ms? > > > Therefore, to fix this issue, we just ensure that the misplaced VMA > > does not get evicted when we try to pin it with PIN_MAPPABLE -- by > > returning early if the mappable/fenceable flag is not set. > > > > Testcase: > > Running Weston and weston-simple-egl on an Alderlake_S (ADLS) platform > > with a 8K@60 mode results in only ~40 FPS (compared to ~59 FPS with > > this patch). Since upstream Weston submits a frame ~7ms before the > > next vblank, the latencies seen between atomic commit and flip event > > are 7, 24 (7 + 16.66), 7, 24..... suggesting that it misses the > > vblank every other frame. > > > > Here is the ftrace snippet that shows the source of the ~10ms latency: > > i915_gem_object_pin_to_display_plane() { > > 0.102 us | i915_gem_object_set_cache_level(); > > i915_gem_object_ggtt_pin_ww() { > > 0.390 us | i915_vma_instance(); > > 0.178 us | i915_vma_misplaced(); > > i915_vma_unbind() { > > __i915_active_wait() { > > 0.082 us | i915_active_acquire_if_busy(); > > 0.475 us | } > > intel_runtime_pm_get() { > > 0.087 us | intel_runtime_pm_acquire(); > > 0.259 us | } > > __i915_active_wait() { > > 0.085 us | i915_active_acquire_if_busy(); > > 0.240 us | } > > __i915_vma_evict() { > > ggtt_unbind_vma() { > > gen8_ggtt_clear_range() { > > 10507.255 us | } > > 10507.689 us | } > > 10508.516 us | } > > > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> > > Signed-off-by: Vivek Kasireddy <vivek.kasireddy@xxxxxxxxx> > > --- > > drivers/gpu/drm/i915/i915_gem.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > index 9747924cc57b..7307c5de1c58 100644 > > --- a/drivers/gpu/drm/i915/i915_gem.c > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > @@ -939,8 +939,14 @@ i915_gem_object_ggtt_pin_ww(struct drm_i915_gem_object > *obj, > > if (i915_vma_is_pinned(vma) || i915_vma_is_active(vma)) > > return ERR_PTR(-ENOSPC); > > > > + /* > > + * If this misplaced vma is too big (i.e, at-least > > + * half the size of aperture) or just unmappable, > > + * we would not be able to pin with PIN_MAPPABLE. > > + */ > > I would be tempted to describe the ping-pong issue in the comment. In > short would do it, but just because git blame on a line of code tends to > fail to lead to the correct commit message after a while. > > So maybe just say along the lines of "If the misplaced vma is too big > ... or hasn't been pinned mappable before, we ignore the misplacement > when PIN_NONBLOCK is set in order to avoid ping-pong of double (or more) > -buffered frame buffers into the aperture and out on every vblank." [Kasireddy, Vivek] Ok, will include more comments in v2. > > > if (flags & PIN_MAPPABLE && > > - vma->fence_size > ggtt->mappable_end / 2) > > + (vma->fence_size > ggtt->mappable_end / 2 || > > + !i915_vma_is_map_and_fenceable(vma))) > > return ERR_PTR(-ENOSPC); > > } > > > > With the expanded comment it looks good to me. > > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> [Kasireddy, Vivek] Thank you for the review. Thanks, Vivek > > + Daniel if he wants to double check it. > > Regards, > > Tvrtko