Re: [PATCH v1] drm/i915/gem: Don't evict unmappable VMAs when pinning with PIN_MAPPABLE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 18/03/2022 07:39, Kasireddy, Vivek wrote:
Hi Tvrtko,


On 17/03/2022 07:23, Vivek Kasireddy wrote:
On platforms capable of allowing 8K (7680 x 4320) modes, pinning 2 or
more framebuffers/scanout buffers results in only one that is mappable/
fenceable. Therefore, pageflipping between these 2 FBs where only one
is mappable/fenceable creates latencies large enough to miss alternate
vblanks thereby producing less optimal framerate.

This mainly happens because when i915_gem_object_pin_to_display_plane()
is called to pin one of the FB objs, the associated vma is identified
as misplaced -- because there is no space for it in the aperture --
and therefore i915_vma_unbind() is called which unbinds and evicts it.
This misplaced vma gets subseqently pinned only when
i915_gem_object_ggtt_pin_ww() is called without PIN_MAPPABLE. This whole
thing results in a latency of ~10ms and happens every other repaint cycle.

Just out of curiosity - have you looked at where does this 10ms come
from? Like is it simply clearing/writing PTEs so expensive, or there is
more to it? Apologies if I asked this before..
[Kasireddy, Vivek] It appears the ~10ms latency seems to come from the
execution of gen8_ggtt_clear_range() as seen in the trace log:
           weston-4124  [008] 161210.397563: funcgraph_entry:                   |                          __i915_vma_evict() {
           weston-4124  [008] 161210.397564: funcgraph_entry:                   |                            ggtt_unbind_vma() {
           weston-4124  [008] 161210.397564: funcgraph_entry:                   |                              gen8_ggtt_clear_range() {
           weston-4124  [008] 161210.408012: funcgraph_exit:       # 10416.281 us |                              }
           weston-4124  [008] 161210.408012: funcgraph_exit:       # 10448.740 us |                            }
           weston-4124  [008] 161210.408012: funcgraph_entry:                   |                            __vma_put_pages() {
           weston-4124  [008] 161210.408013: funcgraph_entry:        0.083 us   |                              clear_pages();
           weston-4124  [008] 161210.408013: funcgraph_exit:         0.578 us   |                            }
           weston-4124  [008] 161210.408013: funcgraph_exit:       # 10449.622 us |                          }

And, for some reason, I can't get trace-cmd to capture more symbols to
gain more insights. However, gen8_ggtt_clear_range() seems to just do
this:
         for (i = 0; i < num_entries; i++)
                 gen8_set_pte(&gtt_base[i], scratch_pte);
where,
vma's start = 182190080, length = 132710400, first = 44480, num_entries = 32400
and we have
void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
{
         writeq(pte, addr);
}

Any idea why executing writeq 32400 times would result in a latency
of ~10ms?

Uncached writes I guess, due VT-d workaround being required to avoid display engine faulting due overfetching.

Something like https://patchwork.freedesktop.org/series/97492/ would have probably resolved this issue as well. I guess time to go and read that series in detail.

Regards,

Tvrtko



Therefore, to fix this issue, we just ensure that the misplaced VMA
does not get evicted when we try to pin it with PIN_MAPPABLE -- by
returning early if the mappable/fenceable flag is not set.

Testcase:
Running Weston and weston-simple-egl on an Alderlake_S (ADLS) platform
with a 8K@60 mode results in only ~40 FPS (compared to ~59 FPS with
this patch). Since upstream Weston submits a frame ~7ms before the
next vblank, the latencies seen between atomic commit and flip event
are 7, 24 (7 + 16.66), 7, 24..... suggesting that it misses the
vblank every other frame.

Here is the ftrace snippet that shows the source of the ~10ms latency:
                i915_gem_object_pin_to_display_plane() {
0.102 us   |    i915_gem_object_set_cache_level();
                  i915_gem_object_ggtt_pin_ww() {
0.390 us   |      i915_vma_instance();
0.178 us   |      i915_vma_misplaced();
                    i915_vma_unbind() {
                    __i915_active_wait() {
0.082 us   |        i915_active_acquire_if_busy();
0.475 us   |      }
                    intel_runtime_pm_get() {
0.087 us   |        intel_runtime_pm_acquire();
0.259 us   |      }
                    __i915_active_wait() {
0.085 us   |        i915_active_acquire_if_busy();
0.240 us   |      }
                    __i915_vma_evict() {
                      ggtt_unbind_vma() {
                        gen8_ggtt_clear_range() {
10507.255 us |        }
10507.689 us |      }
10508.516 us |   }

Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@xxxxxxxxx>
---
   drivers/gpu/drm/i915/i915_gem.c | 8 +++++++-
   1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9747924cc57b..7307c5de1c58 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -939,8 +939,14 @@ i915_gem_object_ggtt_pin_ww(struct drm_i915_gem_object
*obj,
   			if (i915_vma_is_pinned(vma) || i915_vma_is_active(vma))
   				return ERR_PTR(-ENOSPC);

+			/*
+			 * If this misplaced vma is too big (i.e, at-least
+			 * half the size of aperture) or just unmappable,
+			 * we would not be able to pin with PIN_MAPPABLE.
+			 */

I would be tempted to describe the ping-pong issue in the comment. In
short would do it, but just because git blame on a line of code tends to
fail to lead to the correct commit message after a while.

So maybe just say along the lines of "If the misplaced vma is too big
... or hasn't been pinned mappable before, we ignore the misplacement
when PIN_NONBLOCK is set in order to avoid ping-pong of double (or more)
-buffered frame buffers into the aperture and out on every vblank."
[Kasireddy, Vivek] Ok, will include more comments in v2.


   			if (flags & PIN_MAPPABLE &&
-			    vma->fence_size > ggtt->mappable_end / 2)
+			    (vma->fence_size > ggtt->mappable_end / 2 ||
+			    !i915_vma_is_map_and_fenceable(vma)))
   				return ERR_PTR(-ENOSPC);
   		}


With the expanded comment it looks good to me.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
[Kasireddy, Vivek] Thank you for the review.

Thanks,
Vivek

+ Daniel if he wants to double check it.

Regards,

Tvrtko



[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux