On 23/09/2019 16:43, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2019-09-23 15:29:11)
On 20/09/2019 23:26, Chris Wilson wrote:
The expectation for bonded submission is that they are run concurrently,
in parallel on multiple engines. However, given a lack of constraints in
the scheduler's selection combined with timeslicing could mean that the
bonded requests could be run in opposite order on the same engine. With
just the right pair of requests, this can cause a GPU hang (or at least
trigger hangchecker), best (worst) case would be execution running
several times slower than ideal.
I don't see any bonding being setup?
[comes back later]
Oh you used only the submit fence and not actually bonds. But you also
don't use the virtual engine at all?
A is using either of the 2 real engines, B is using the virtual engine
to select the other available engine. Bonding in this case is just that
the requests are bonded together to run in parallel.
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
---
tests/i915/gem_exec_balancer.c | 151 +++++++++++++++++++++++++++++++++
1 file changed, 151 insertions(+)
diff --git a/tests/i915/gem_exec_balancer.c b/tests/i915/gem_exec_balancer.c
index 407dc0eca..e4fe75747 100644
--- a/tests/i915/gem_exec_balancer.c
+++ b/tests/i915/gem_exec_balancer.c
@@ -30,6 +30,15 @@
IGT_TEST_DESCRIPTION("Exercise in-kernel load-balancing");
+#define MI_SEMAPHORE_WAIT (0x1c << 23)
+#define MI_SEMAPHORE_POLL (1 << 15)
+#define MI_SEMAPHORE_SAD_GT_SDD (0 << 12)
+#define MI_SEMAPHORE_SAD_GTE_SDD (1 << 12)
+#define MI_SEMAPHORE_SAD_LT_SDD (2 << 12)
+#define MI_SEMAPHORE_SAD_LTE_SDD (3 << 12)
+#define MI_SEMAPHORE_SAD_EQ_SDD (4 << 12)
+#define MI_SEMAPHORE_SAD_NEQ_SDD (5 << 12)
+
#define INSTANCE_COUNT (1 << I915_PMU_SAMPLE_INSTANCE_BITS)
static size_t sizeof_load_balance(int count)
@@ -694,6 +703,145 @@ static void bonded(int i915, unsigned int flags)
gem_context_destroy(i915, master);
}
+static unsigned int offset_in_page(void *addr)
+{
+ return (uintptr_t)addr & 4095;
+}
+
+static uint32_t create_semaphore_to_spinner(int i915, igt_spin_t *spin)
+{
+ uint32_t *cs, *map;
+ uint32_t handle;
+
+ handle = gem_create(i915, 4096);
+ cs = map = gem_mmap__cpu(i915, handle, 0, 4096, PROT_WRITE);
+
+ /* Wait until the spinner is running */
+ *cs++ = MI_SEMAPHORE_WAIT |
+ MI_SEMAPHORE_POLL |
+ MI_SEMAPHORE_SAD_NEQ_SDD |
+ (4 - 2);
+ *cs++ = 0;
+ *cs++ = spin->obj[0].offset + 4 * SPIN_POLL_START_IDX;
+ *cs++ = 0;
+
+ /* Then cancel the spinner */
+ *cs++ = MI_STORE_DWORD_IMM;
+ *cs++ = spin->obj[IGT_SPIN_BATCH].offset +
+ offset_in_page(spin->condition);
+ *cs++ = 0;
+ *cs++ = MI_BATCH_BUFFER_END;
+
+ *cs++ = MI_BATCH_BUFFER_END;
+ munmap(map, 4096);
+
+ return handle;
+}
+
+static void bonded_slice(int i915)
+{
+ uint32_t ctx;
+ int *stop;
+
+ igt_require(gem_scheduler_has_semaphores(i915));
+
+ stop = mmap(0, 4096, PROT_WRITE, MAP_SHARED | MAP_ANON, -1, 0);
+ igt_assert(stop != MAP_FAILED);
+
+ ctx = gem_context_create(i915);
+
+ for (int class = 0; class < 32; class++) {
+ struct i915_engine_class_instance *siblings;
+ struct drm_i915_gem_exec_object2 obj[3] = {};
+ struct drm_i915_gem_execbuffer2 eb = {};
+ unsigned int count;
+ igt_spin_t *spin;
+
+ siblings = list_engines(i915, 1u << class, &count);
+ if (!siblings)
+ continue;
+
+ if (count < 2) {
+ free(siblings);
+ continue;
+ }
+
+ /*
+ * A: semaphore wait on spinner; cancel spinner
+ * B: unpreemptable spinner
+ *
+ * A waits for running ack from B, if scheduled on the same
+ * engine -> hang.
+ *
+ * C+: background load across engines
+ */
+
+ set_load_balancer(i915, ctx, siblings, count, NULL);
+
+ spin = __igt_spin_new(i915,
+ .ctx = ctx,
+ .flags = (IGT_SPIN_NO_PREEMPTION |
+ IGT_SPIN_POLL_RUN));
+ igt_spin_end(spin); /* we just want its address for later */
+ gem_sync(i915, spin->handle);
+ igt_spin_reset(spin);
+
+ obj[0] = spin->obj[0];
+ obj[1] = spin->obj[1];
igt_assert_eq(IGT_SPIN_BATCH, 1);
?
+ obj[2].handle = create_semaphore_to_spinner(i915, spin);
+
+ eb.buffers_ptr = to_user_pointer(obj);
+ eb.rsvd1 = ctx;
+
+ *stop = 0;
+ igt_fork(child, count + 1) {
+ igt_list_del(&spin->link);
+
+ ctx = gem_context_clone(i915, ctx,
+ I915_CONTEXT_CLONE_ENGINES, 0);
+
+ while (!READ_ONCE(*stop)) {
+ spin = igt_spin_new(i915,
+ .ctx = ctx,
+ .engine = (1 + rand() % count),
With "count + 1" children and rand load my end up uneven across engines
- are you happy with that?
It's using rand, it's going to be uneven. count + 1 isn't significant in
any way. I stopped as soon as I had the test reliably hitting the issue.
+ .flags = IGT_SPIN_POLL_RUN);
+ igt_spin_busywait_until_started(spin);
+ usleep(50000);
50ms, hm, ideally there should be a pipe signal before parent starts the
test to know children have started. Otherwise parent can finish before
they even start, no?
The children are just to provide noise. The requirement is that we have
enough load across the system to cause timeslicing to kick in.
+ igt_spin_free(i915, spin);
+ }
+
+ gem_context_destroy(i915, ctx);
+ }
+
+ igt_until_timeout(5) {
+ igt_spin_reset(spin);
What is the reset for?
We are reusing the spinner inside the loop.
+
+ /* A: Submit the semaphore wait */
+ eb.buffer_count = 3;
+ eb.flags = (1 + rand() % count) | I915_EXEC_FENCE_OUT;
+ gem_execbuf_wr(i915, &eb);
+
+ /* B: Submit the spinner (in parallel) */
How in parallel when it is the same context so they are implicitly in order?
Different engines with different timelines, using the submit to request
parallel execution.
Yeah I kind of missed a few things. Looks good. For completeness you
should also add a flavour which actually sets up the bond so the "if
(bond)" path in virtual_bond_execute is also exercised. But this looks good.
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx