Re: [igt-dev] [PATCH i-g-t 2/2] tests/gem_exec_await: Add a memory pressure subtest

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Mon, 19 Nov 2018 15:54:44 +0000

On 19/11/2018 15:36, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2018-11-19 15:22:29)
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>

Memory pressure subtest attempts to provoke system overload which can
cause GPU hangs, especially when combined with spin batches which do
not allow for some nop instructions to provide relief.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
---
  tests/i915/gem_exec_await.c | 107 ++++++++++++++++++++++++++++++++++++
  1 file changed, 107 insertions(+)

diff --git a/tests/i915/gem_exec_await.c b/tests/i915/gem_exec_await.c
index 3ea5b5903c6b..ccb5159a6fe1 100644
--- a/tests/i915/gem_exec_await.c
+++ b/tests/i915/gem_exec_await.c
@@ -30,6 +30,11 @@
  
  #include <sys/ioctl.h>
  #include <sys/signal.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <pthread.h>
+#include <sched.h>
  
  #define LOCAL_I915_EXEC_NO_RELOC (1<<11)
  #define LOCAL_I915_EXEC_HANDLE_LUT (1<<12)
@@ -227,6 +232,92 @@ static void wide(int fd, int ring_size, int timeout, unsigned int flags)
         free(exec);
  }
  
+struct thread {
+       pthread_t thread;
+       volatile bool done;
+};
+
+static unsigned long get_avail_ram_mb(void)

intel_get_avail_ram_mb() ?

I thought so but when things went slow I looked inside and concluded it 
is not suitable.

+#define PAGE_SIZE 4096
+static void *mempressure(void *arg)
+{
+       struct thread *thread = arg;
+       const unsigned int sz_mb = 2;
+       const unsigned int sz = sz_mb << 20;
+       unsigned int n = 0, max = 0;
+       unsigned int blocks;
+       void **ptr = NULL;
+
+       while (!thread->done) {

You can use READ_ONCE(thread->done) here for familiarity.

Okay, didn't realize we copied it to IGT.

+               unsigned long ram_mb = get_avail_ram_mb();
+
+               if (!ptr) {
+                       blocks = ram_mb / sz_mb;
+                       ptr = calloc(blocks, sizeof(void *));
+                       igt_assert(ptr);
+               } else if (ram_mb < 384) {
+                       blocks = max + 1;
+               }
+
+               if (ptr[n])
+                       munmap(ptr[n], sz);
+
+               ptr[n] = mmap(NULL, sz, PROT_WRITE,
+                             MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+               assert(ptr[n] != MAP_FAILED);
+
+               madvise(ptr[n], sz, MADV_HUGEPAGE);
+
+               for (size_t page = 0; page < sz; page += PAGE_SIZE)
+                       *(volatile uint32_t *)((unsigned char *)ptr[n] + page) =
+                               0;
+
+               if (n > max)
+                       max = n;
+
+               n++;
+
+               if (n >= blocks)
+                       n = 0;

Another method would be to use mlock to force exhaustion.

However, as the supposition is that rcu is part of the underlying
mechanism if you fill the dentry cache we'll exercise both the shrinker
and RCU.

As said in previous reply, in my testing, well at least the one thing I 
was able to reproduce and which has the same symptoms as the bug, the 
problem went away with the addition of nops.

But yeah, maybe that could be an indirect effect.

Also this cleaned up patch does not cut it any longer. :( I seems I've 
lost the magic ingredient to reproduce the stalls during cleanups. I 
have to go back and add stuff to get it back.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx