Re: [igt-dev] [PATCH i-g-t 2/3] tests/gem_eio: Speed up test execution

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Thu, 22 Mar 2018 12:36:58 +0000

On 22/03/2018 11:39, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2018-03-22 11:17:11)
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>

If we stop relying on regular GPU hangs to be detected, but trigger them
manually as soon as we know our batch of interest is actually executing
on the GPU, we can dramatically speed up various subtests.

This is enabled by the pollable spin batch added in the previous patch.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Suggested-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Antonio Argenziano <antonio.argenziano@xxxxxxxxx>
---
Note that the 'wait' subtest is mysteriously hanging for me in the no-op
batch send by gem_test_engines, but only on RCS engine. TBD while I am
getting some CI results.
---
  lib.tar         | Bin 0 -> 102400 bytes
  tests/gem_eio.c |  97 ++++++++++++++++++++++++++++++++++++++++----------------
  2 files changed, 70 insertions(+), 27 deletions(-)
  create mode 100644 lib.tar

diff --git a/lib.tar b/lib.tar
new file mode 100644
index 0000000000000000000000000000000000000000..ea04fad219a87f2e975852989526f8da4c9b7d6d
GIT binary patch
literal 102400
zcmeHw>vkJQlBWMsPf=!{kwJ=gUAkMAJc3A2!kPrQASAWM<5LF&3M57#fW}3X+V;H9

Looks correct.

Just backing up in the cloud. :))

diff --git a/tests/gem_eio.c b/tests/gem_eio.c
index 4bcc5937db39..93400056124b 100644
--- a/tests/gem_eio.c
+++ b/tests/gem_eio.c
@@ -71,26 +71,23 @@ static void trigger_reset(int fd)
         gem_quiescent_gpu(fd);
  }
  
-static void wedge_gpu(int fd)
+static void manual_hang(int drm_fd)
  {
-       /* First idle the GPU then disable GPU resets before injecting a hang */
-       gem_quiescent_gpu(fd);
-
-       igt_require(i915_reset_control(false));
+       int dir = igt_debugfs_dir(drm_fd);
  
-       igt_debug("Wedging GPU by injecting hang\n");
-       igt_post_hang_ring(fd, igt_hang_ring(fd, I915_EXEC_DEFAULT));
+       igt_sysfs_set(dir, "i915_wedged", "-1");
  
-       igt_assert(i915_reset_control(true));
+       close(dir);
  }

Ok.

-static void wedgeme(int drm_fd)
+static void wedge_gpu(int fd)
  {
-       int dir = igt_debugfs_dir(drm_fd);
-
-       igt_sysfs_set(dir, "i915_wedged", "-1");
+       /* First idle the GPU then disable GPU resets before injecting a hang */
+       gem_quiescent_gpu(fd);
  
-       close(dir);
+       igt_require(i915_reset_control(false));
+       manual_hang(fd);
+       igt_assert(i915_reset_control(true));
  }

Ok.

Well done reading that awful diff!

  
  static int __gem_throttle(int fd)
@@ -149,29 +146,66 @@ static int __gem_wait(int fd, uint32_t handle, int64_t timeout)
         return err;
  }
  
+static igt_spin_t * __spin_poll(int fd, uint32_t ctx, unsigned long flags)
+{
+       if (gem_can_store_dword(fd, flags))
+               return __igt_spin_batch_new_poll(fd, ctx, flags);
+       else
+               return __igt_spin_batch_new(fd, ctx, flags, 0);
+}
+
+static void __spin_wait(int fd, igt_spin_t *spin)
+{
+       if (spin->running) {
+               while (!*((volatile bool *)spin->running))
+                       ;
+       } else {
+               igt_debug("__spin_wait - usleep mode\n");
+               usleep(500e3); /* Better than nothing! */
+       }
+}
+
+/*
+ * Wedge the GPU when we know our batch is running.
+ */
+static void wedge_after_running(int fd, igt_spin_t *spin)
+{
+       __spin_wait(fd, spin);
+       manual_hang(fd);
+}
+
  static void test_wait(int fd)
  {
-       igt_hang_t hang;
+       struct timespec ts = { };
+       igt_spin_t *hang;
  
         igt_require_gem(fd);
  
+       igt_nsec_elapsed(&ts);
+
         /* If the request we wait on completes due to a hang (even for
          * that request), the user expects the return value to 0 (success).
          */
-       hang = igt_hang_ring(fd, I915_EXEC_DEFAULT);
-       igt_assert_eq(__gem_wait(fd, hang.handle, -1), 0);
-       igt_post_hang_ring(fd, hang);
+       igt_require(i915_reset_control(true));
+       hang = __spin_poll(fd, 0, I915_EXEC_DEFAULT);
+       wedge_after_running(fd, hang);
+       igt_assert_eq(__gem_wait(fd, hang->handle, -1), 0);
+       igt_spin_batch_free(fd, hang);

  
         /* If the GPU is wedged during the wait, again we expect the return
          * value to be 0 (success).
          */
         igt_require(i915_reset_control(false));
-       hang = igt_hang_ring(fd, I915_EXEC_DEFAULT);
-       igt_assert_eq(__gem_wait(fd, hang.handle, -1), 0);
-       igt_post_hang_ring(fd, hang);
+       hang = __spin_poll(fd, 0, I915_EXEC_DEFAULT);
+       wedge_after_running(fd, hang);
+       igt_assert_eq(__gem_wait(fd, hang->handle, -1), 0);
+       igt_spin_batch_free(fd, hang);
         igt_require(i915_reset_control(true));

Hmm. These are not equivalent to the original test. The tests
requires hangcheck to kick in while the test is blocked on igt_wait.
To do a fast equivalent, we need to kick off a timer. (Here we are just
asking if a wait on an already completed request doesn't block, not how
we handle the reset in the middle of a wait. Seems a reasonable addition
though.)

I think that's a general pattern worth repeating for the rest of tests:
don't immediately inject the hang, but leave it a few milliseconds to
allow us to block on the subsequent wait. I would even repeat the tests
a few times with different timeouts; 0, 1us, 10ms (thinking of the
different phases for i915_request_wait).

True, it's not the same.

Makes sense to test with different timeouts. Will do.


         trigger_reset(fd);
+
+       /* HACK for CI */
+       igt_assert(igt_nsec_elapsed(&ts) < 5e9);

igt_seconds_elapsed() the approximation is worth the readability.

In this case you might like to try igt_set_timeout(), as I think each
subtest and exithandlers are in place to make them robust against
premature failures.

Well this was just to see that will happen on the shards here. As 
mentioned in the commit I get that yet unexplained GPU hang at subtest 
exit here. So the assert above is just to notice if the same happens on 
shards.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx