On Thu, Sep 04, 2014 at 10:34:13AM +0200, Daniel Vetter wrote: > On Wed, Sep 03, 2014 at 02:47:21PM -0300, Paulo Zanoni wrote: > > From: Paulo Zanoni <paulo.r.zanoni@xxxxxxxxx> > > > > If we don't reset exit_handler_count before forking, we may have a > > case where the forked process is killed before it even does > > "exit_handler_count = 0": in that case, it is still finishing forking. > > When that happens, we may end up calling our exit handlers. On the > > specific bug I'm investigating, we call igt_reset_connnectors(), which > > ends up in a deadlock inside malloc_atfork. If we attach gdb to the > > forked process and get a backtrace, we have: > > > > (gdb) bt > > 0 __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95 > > 1 0x00007f15634d36bf in _L_lock_10524 () from /lib/x86_64-linux-gnu/libc.so.6 > > 2 0x00007f15634d12ef in malloc_atfork (sz=139729840351352, caller=<optimized out>) at arena.c:181 > > 3 0x00007f15640466a1 in drmMalloc () from /usr/lib/x86_64-linux-gnu/libdrm.so.2 > > 4 0x00007f1564049ad7 in drmModeGetResources () from /usr/lib/x86_64-linux-gnu/libdrm.so.2 > > 5 0x0000000000408f84 in igt_reset_connectors () at igt_kms.c:1656 > > 6 0x00000000004092dc in call_exit_handlers (sig=15) at igt_core.c:1130 > > 7 fatal_sig_handler (sig=15) at igt_core.c:1154 > > 8 <signal handler called> > > 9 0x00007f15634cce60 in ptmalloc_unlock_all2 () at arena.c:298 > > 10 0x00007f156350ca3f in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/x86_64/../fork.c:188 > > 11 0x000000000040a029 in __igt_fork_helper (proc=proc@entry=0x610fc4 <signal_helper>) at igt_core.c:910 > > 12 0x000000000040459d in igt_fork_signal_helper () at igt_aux.c:110 > > 13 0x0000000000402ab7 in __real_main63 () at bug.c:76 > > 14 0x000000000040296e in main (argc=<optimized out>, argv=<optimized out>) at bug.c:63 > > > > After doing some searches for "stuck at malloc_atfork", it seems to me > > we probably shouldn't be doing any malloc calls at this point of the > > code, so the best way to do that is to make sure we can't really run > > the exit handlers. > > > > So on this patch, instead of resetting the exit handlers after > > forking, we reset them before forking, and then restore the original > > value on the parent process. > > > > I can reproduce this problem by running "./kms_flip --run-subtest > > 2x-flip-vs-modeset" under an infinite loop. Usually after a few > > hundred calls, we end up stuck on the deadlock mentioned above. QA > > says this problem happens every time, but I'm not sure what is the > > difference between our environments that makes the race condition so > > much easier for them. > > > > The kms_flip.c problem can be considered a regression introduced by: > > commit eef768f283466b6d7cb3f08381f72ccf3951dc99 > > Author: Thomas Wood <thomas.wood@xxxxxxxxx> > > Date: Wed Jun 18 14:28:43 2014 +0100 > > tests: enable extra connectors in kms_flip and kms_pipe_crc_basic > > > > even though this commit is not the one that introduced the real > > problem. > > > > It is also possible to reproduce this problem with a few modifications > > to template.c: > > - Add a call to igt_enable_connectors() inside the first fixture. > > - Add igt_fork_signal_helper() and igt_stop_signal_helper() calls > > around subtest B. > > > > Cc: Thomas Wood <thomas.wood@xxxxxxxxx> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81367 > > Signed-off-by: Paulo Zanoni <paulo.r.zanoni@xxxxxxxxx> > > Awesome piece of debugging! > > But imo the commit message is missing the explanation why the forked child > exists so quickly - the reason is that for skipped tests the parent kills > it right away, before libc has completed the setup code in the child. > We've had fun with this earlier, see > > commit 139c72f38a07c545f5a9ab5fa3750779987b9275 > Author: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > Date: Tue Dec 3 16:44:55 2013 +0000 > > drmtest: Avoid wrong PID/TID after clone races > > and the earlier attempt at > > commit 9298dfabd9658315df34616b1e9a10b3579a92bd > Author: Daniel Vetter <daniel.vetter@xxxxxxxx> > Date: Fri Sep 13 16:38:59 2013 +0200 > > lib: add test for igt_fork_signal_helper I've meant commit a031a1bf93b828585e7147f06145fc5030814547 Author: Daniel Vetter <daniel.vetter@xxxxxxxx> Date: Fri Sep 13 16:43:22 2013 +0200 lib/drmtest: ducttape over fork race Aside: For bugs in the test library itself I've started to create test-testcases in tests/igt_*.c. You can run them with a simple $ make check If you have an idea for a testcase then please add it, I myself didn't come up with anything. Note that the testcase can't rely at all on an i915 device being enabled, since this is meant to be run while building as a normal user on any kind of machine. So it's really just tests for the core library code. Thanks, Daniel > > I've added a note and commment and merged the patch, thanks. > -Daniel > > > > --- > > lib/igt_core.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/lib/igt_core.c b/lib/igt_core.c > > index 7f14b17..e9a27e3 100644 > > --- a/lib/igt_core.c > > +++ b/lib/igt_core.c > > @@ -895,6 +895,7 @@ bool __igt_fork_helper(struct igt_helper_process *proc) > > { > > pid_t pid; > > int id; > > + int tmp_count; > > > > assert(!proc->running); > > assert(helper_process_count < ARRAY_SIZE(helper_process_pids)); > > @@ -904,16 +905,19 @@ bool __igt_fork_helper(struct igt_helper_process *proc) > > > > igt_install_exit_handler(fork_helper_exit_handler); > > > > + tmp_count = exit_handler_count; > > + exit_handler_count = 0; > > switch (pid = fork()) { > > case -1: > > + exit_handler_count = tmp_count; > > igt_assert(0); > > case 0: > > - exit_handler_count = 0; > > reset_helper_process_list(); > > oom_adjust_for_doom(); > > > > return true; > > default: > > + exit_handler_count = tmp_count; > > proc->running = true; > > proc->pid = pid; > > proc->id = id; > > -- > > 2.1.0 > > > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx