On Wed, Nov 23, 2016 at 12:59:40PM +0200, Abdiel Janulgue wrote: > A lot of igt testcases need some GPU workload to make sure a race > window is big enough. Unfortunately having a fixed amount of > workload leads to spurious test failures or overly long runtimes > on some fast/slow platforms. This library contains functionality > to submit GPU workloads that should consume exactly a specific > amount of time. > > v2 : Add recursive batch feature from Chris > v3 : Drop auto-tuned stuff. Add bo dependecy to recursive batch > by adding a dummy reloc to the bo as suggested by Ville. > v4: Fix dependency reloc as write instead of read (Ville). > Fix wrong handling of batchbuffer start on ILK causing > test failure > v5: Convert kms_busy to use this api > v6: Add this library to docs > v7: Document global use of batch, reuse defines > Minor code cleanups. > Rename igt_spin_batch and igt_post_spin_batch to > igt_spin_batch_new and igt_spin_batch_free > respectively (Tomeu Vizoso). > Fix error in dependency relocation handling in HSW causing > tests to fail. > v8: Restore correct order of objects in the execbuffer. Batch > object should always be last. > v9 : Add helper to terminate batch manually > v10: Split timeout function. Clarify function names (Chris) > > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> > Cc: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: tomeu@xxxxxxxxxxxxxxx > Reviewed-by: Tomeu Vizoso <tomeu.vizoso@xxxxxxxxxxxxx> > Signed-off-by: Abdiel Janulgue <abdiel.janulgue@xxxxxxxxxxxxxxx> > --- > .../intel-gpu-tools/intel-gpu-tools-docs.xml | 1 + > lib/Makefile.sources | 2 + > lib/igt.h | 1 + > lib/igt_dummyload.c | 299 +++++++++++++++++++++ > lib/igt_dummyload.h | 47 ++++ > 5 files changed, 350 insertions(+) > create mode 100644 lib/igt_dummyload.c > create mode 100644 lib/igt_dummyload.h > > diff --git a/docs/reference/intel-gpu-tools/intel-gpu-tools-docs.xml b/docs/reference/intel-gpu-tools/intel-gpu-tools-docs.xml > index c862f2a..55902ab 100644 > --- a/docs/reference/intel-gpu-tools/intel-gpu-tools-docs.xml > +++ b/docs/reference/intel-gpu-tools/intel-gpu-tools-docs.xml > @@ -32,6 +32,7 @@ > <xi:include href="xml/intel_io.xml"/> > <xi:include href="xml/igt_vc4.xml"/> > <xi:include href="xml/igt_vgem.xml"/> > + <xi:include href="xml/igt_dummyload.xml"/> > </chapter> > <xi:include href="xml/igt_test_programs.xml"/> > > diff --git a/lib/Makefile.sources b/lib/Makefile.sources > index e8e277b..7fc5ec2 100644 > --- a/lib/Makefile.sources > +++ b/lib/Makefile.sources > @@ -75,6 +75,8 @@ lib_source_list = \ > igt_draw.h \ > igt_pm.c \ > igt_pm.h \ > + igt_dummyload.c \ > + igt_dummyload.h \ > uwildmat/uwildmat.h \ > uwildmat/uwildmat.c \ > $(NULL) > diff --git a/lib/igt.h b/lib/igt.h > index d751f24..a0028d5 100644 > --- a/lib/igt.h > +++ b/lib/igt.h > @@ -32,6 +32,7 @@ > #include "igt_core.h" > #include "igt_debugfs.h" > #include "igt_draw.h" > +#include "igt_dummyload.h" > #include "igt_fb.h" > #include "igt_gt.h" > #include "igt_kms.h" > diff --git a/lib/igt_dummyload.c b/lib/igt_dummyload.c > new file mode 100644 > index 0000000..afb0851 > --- /dev/null > +++ b/lib/igt_dummyload.c > @@ -0,0 +1,299 @@ > +/* > + * Copyright © 2016 Intel Corporation > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > + * copy of this software and associated documentation files (the "Software"), > + * to deal in the Software without restriction, including without limitation > + * the rights to use, copy, modify, merge, publish, distribute, sublicense, > + * and/or sell copies of the Software, and to permit persons to whom the > + * Software is furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice (including the next > + * paragraph) shall be included in all copies or substantial portions of the > + * Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS > + * IN THE SOFTWARE. > + * > + */ > + > +#include "igt.h" > +#include "igt_dummyload.h" > +#include <time.h> > +#include <signal.h> > +#include <sys/syscall.h> > + > +/** > + * SECTION:igt_dummyload > + * @short_description: Library for submitting GPU workloads > + * @title: Dummyload > + * @include: igt.h > + * > + * A lot of igt testcases need some GPU workload to make sure a race window is > + * big enough. Unfortunately having a fixed amount of workload leads to > + * spurious test failures or overly long runtimes on some fast/slow platforms. > + * This library contains functionality to submit GPU workloads that should > + * consume exactly a specific amount of time. > + */ > + > +#define LOCAL_I915_EXEC_BSD_SHIFT (13) > +#define LOCAL_I915_EXEC_BSD_MASK (3 << LOCAL_I915_EXEC_BSD_SHIFT) > + > +#define ENGINE_MASK (I915_EXEC_RING_MASK | LOCAL_I915_EXEC_BSD_MASK) > + > +static const int bo_size = 4096; > + > +static void > +fill_object(struct drm_i915_gem_exec_object2 *obj, uint32_t gem_handle, > + struct drm_i915_gem_relocation_entry *relocs, uint32_t count) > +{ > + memset(obj, 0, sizeof(*obj)); > + obj->handle = gem_handle; > + obj->relocation_count = count; > + obj->relocs_ptr = (uintptr_t)relocs; > +} > + > +static void > +fill_reloc(struct drm_i915_gem_relocation_entry *reloc, > + uint32_t gem_handle, uint32_t offset, > + uint32_t read_domains, uint32_t write_domains) > +{ > + reloc->target_handle = gem_handle; > + reloc->delta = 0; > + reloc->offset = offset * sizeof(uint32_t); > + reloc->presumed_offset = 0; > + reloc->read_domains = read_domains; > + reloc->write_domain = write_domains; > +} > + > +/* > + * Needs to be global. Signal handlers don't accept arguments > + */ > +static uint32_t *batch; > + > +static uint32_t emit_recursive_batch(int fd, int engine, unsigned dep_handle) > +{ > + const int gen = intel_gen(intel_get_drm_devid(fd)); > + struct drm_i915_gem_exec_object2 obj[2]; > + struct drm_i915_gem_relocation_entry relocs[2]; > + struct drm_i915_gem_execbuffer2 execbuf; > + unsigned engines[16]; > + unsigned nengine, handle; > + int i = 0, reloc_count = 0, buf_count = 0; > + > + buf_count = 0; > + nengine = 0; > + if (engine < 0) { > + for_each_engine(fd, engine) > + if (engine) > + engines[nengine++] = engine; > + } else { > + gem_require_ring(fd, engine); > + engines[nengine++] = engine; > + } > + igt_require(nengine); > + > + memset(&execbuf, 0, sizeof(execbuf)); > + memset(obj, 0, sizeof(obj)); > + memset(relocs, 0, sizeof(relocs)); > + > + execbuf.buffers_ptr = (uintptr_t) obj; > + handle = gem_create(fd, bo_size); > + batch = gem_mmap__gtt(fd, handle, bo_size, PROT_WRITE); > + igt_assert(batch); > + gem_set_domain(fd, handle, > + I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT); > + > + if (dep_handle > 0) { > + igt_assert(nengine == 1); I have some examples of using a spinning batch with a write hazard across multiple engines... I guess you haven't yet looked at enough use cases ;) > + /* dummy write to dependency */ > + fill_object(&obj[buf_count], dep_handle, NULL, 0); > + buf_count++; > + > + fill_reloc(&relocs[reloc_count], dep_handle, 256, > + I915_GEM_DOMAIN_RENDER, > + I915_GEM_DOMAIN_RENDER); > + reloc_count++; > + } > + > + if (gen >= 8) { > + batch[i++] = MI_BATCH_BUFFER_START | 1 << 8 | 1; > + /* recurse */ > + fill_reloc(&relocs[reloc_count], handle, i, > + I915_GEM_DOMAIN_COMMAND, 0); > + batch[i++] = 0; > + batch[i++] = 0; > + } else if (gen >= 6) { > + batch[i++] = MI_BATCH_BUFFER_START | 1 << 8; > + /* recurse */ > + fill_reloc(&relocs[reloc_count], handle, i, > + I915_GEM_DOMAIN_COMMAND, 0); > + batch[i++] = 0; > + } else { > + batch[i++] = MI_BATCH_BUFFER_START | 2 << 6 | > + ((gen < 4) ? 1 : 0); > + /* recurse */ > + fill_reloc(&relocs[reloc_count], handle, i, > + I915_GEM_DOMAIN_COMMAND, 0); > + batch[i++] = 0; > + if (gen < 4) > + relocs[reloc_count].delta = 1; > + } > + reloc_count++; > + > + fill_object(&obj[buf_count], handle, relocs, reloc_count); > + buf_count++; > + > + for (i = 0; i < nengine; i++) { > + execbuf.flags &= ~ENGINE_MASK; > + execbuf.flags = engines[i]; > + execbuf.buffer_count = buf_count; > + gem_execbuf(fd, &execbuf); > + } > + > + return handle; > +} > + > +static void exit_batch_handler(int sig, siginfo_t *info, void *spin) > +{ > + *batch = MI_BATCH_BUFFER_END; > + __sync_synchronize(); > +} > + > +/** > + * igt_spin_batch_new: > + * @fd: open i915 drm file descriptor > + * @engine: Ring to execute batch OR'd with execbuf flags. If value is less > + * than 0, execute on all available rings. > + * @dep_handle: handle to a buffer object dependency. If greater than 0, add a > + * relocation entry to this buffer within the batch. > + * > + * Start a recursive batch on a ring. Immediately returns a #igt_spin_t that > + * contains the batch's handle that can be waited upon. The returned structure > + * must be passed to igt_spin_batch_free() for post-processing. > + * > + * Returns: > + * Structure with helper internal state for igt_spin_batch_free(). > + */ > +igt_spin_t * > +igt_spin_batch_new(int fd, int engine, unsigned dep_handle) > +{ > + igt_spin_t *spin = calloc(1, sizeof(struct igt_spin)); > + uint32_t handle = emit_recursive_batch(fd, engine, dep_handle); > + int64_t wait_timeout = 0; > + igt_assert_eq(gem_wait(fd, handle, &wait_timeout), -ETIME); This is gem_bo_busy(). The gem_wait() was for sanitychecking gem_wait.c test setup. A really useful improvement (especially for a library function) is to hook into an exit handler to ensure that all submitted batches are completed. You can hook into gem_quiescent_gpu() (as that is a useful catch). > + > + spin->handle = handle; > + spin->batch = batch; > + spin->timer = NULL; > + > + return spin; > +} > + > +/** > + * igt_spin_batch_set_timeout: > + * @spin: spin batch state from igt_spin_batch_new() > + * @ns: amount of time in nanoseconds the batch continues to execute > + * before finishing. > + * > + * Specify a timeout. This ends the recursive batch associated with @spin after > + * the timeout has elapsed. > + */ > +void igt_spin_batch_set_timeout(igt_spin_t *spin, int64_t ns) > +{ > + timer_t timer; > + struct sigevent sev; > + struct sigaction act; > + struct itimerspec its; > + > + igt_assert(ns > 0); > + if (!spin) > + return; > + > + memset(&sev, 0, sizeof(sev)); > + sev.sigev_notify = SIGEV_SIGNAL | SIGEV_THREAD_ID; > + sev.sigev_notify_thread_id = gettid(); > + sev.sigev_signo = SIGRTMIN + 1; > + igt_assert(timer_create(CLOCK_MONOTONIC, &sev, &timer) == 0); > + igt_assert(timer > 0); > + > + memset(&act, 0, sizeof(act)); > + act.sa_sigaction = exit_batch_handler; > + act.sa_flags = SA_SIGINFO; > + igt_assert(sigaction(SIGRTMIN + 1, &act, NULL) == 0); > + > + memset(&its, 0, sizeof(its)); > + its.it_value.tv_sec = ns / NSEC_PER_SEC; > + its.it_value.tv_nsec = ns % NSEC_PER_SEC; > + igt_assert(timer_settime(timer, 0, &its, NULL) == 0); > + > + spin->timer = timer; > +} > + > +/** > + * igt_spin_batch_end: > + * @spin: spin batch state from igt_spin_batch_new() > + * > + * End the recursive batch associated with @spin manually. > + */ > +void igt_spin_batch_end(igt_spin_t *spin) > +{ > + if (!spin) > + return; > + > + if (spin->handle > 0) > + exit_batch_handler(0, NULL, NULL); We should never create an igt_spin_t with an invalid handle. All the spin->handle > 0 are pointless. And you already exploded on !spin during construction. Please make your mind up between testing for a NULL spin in the caller or callee. > +/** > + * igt_spin_batch_free: > + * @fd: open i915 drm file descriptor > + * @spin: spin batch state from igt_spin_batch_new() > + * > + * This function does the necessary post-processing after starting a recursive > + * batch with igt_spin_batch_new(). > + */ > +void igt_spin_batch_free(int fd, igt_spin_t *spin) > +{ > + if (!spin) > + return; > + > + if (spin->handle > 0) > + gem_close(fd, spin->handle); > + > + if (spin->timer > 0) > + timer_delete(spin->timer); Awooga! Before you unmap it, it is imperative that you ensure the batch is terminated with BBE. > + munmap(spin->batch, bo_size); > + free(spin); > +} > + > +/** > + * igt_spin_batch_wait: > + * @fd: open i915 drm file descriptor > + * @ns: amount of time in nanoseconds the batch continues to execute > + * before finishing. > + * @engine: ring to execute batch OR'd with execbuf flags. If value is less > + * than 0, execute on all available rings. > + * @dep_handle: handle to a buffer object dependency. If greater than 0, include > + * this buffer on the wait dependency > + * > + * Convenience function similar to igt_spin_batch_new() then setting the timeout > + * with igt_spin_batch_set_timeout(), but waits on the recursive batch to finish > + * instead of returning right away. The function also does the necessary > + * post-processing automatically. > + */ > +void igt_spin_batch_wait(int fd, int64_t ns, int engine, unsigned dep_handle) > +{ > + igt_spin_t *spin = igt_spin_batch_new(fd, engine, dep_handle); > + int64_t wait_timeout = ns + (0.5 * NSEC_PER_SEC); > + igt_spin_batch_set_timeout(spin, wait_timeout); > + igt_assert_eq(gem_wait(fd, spin->handle, &wait_timeout), 0); This assert is interesting, but may fail simply due to scheduling onto another cpu. This whole function is a risky nanosleep()... Seems pointless, the idea is to create an asynchronous load on the GPU that we later terminate or use in some creative fashion. > + > + igt_spin_batch_free(fd, spin); > +} -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx