On Wed, Jan 25, 2023 at 02:02:40PM -0500, Jeff Hostetler wrote: > Can you tell from your stess test whether the fsmonitor-daemon > is crashing? (It might be subtle since the daemon is auto-started > if necessary, so it might be crashing and silently getting restarted > by the next command.) > > I ask because a SIGPIPE in the client would make me think that the > server suddenly closed the connection unexpectedly, like if it had > SIGSEGV'd or something. Last time around I only looked at the failing test case, and didn't notice anything that might have indicated the cause of the SIGPIPE. This time I chanced to look a bit further up in the test log, and: expecting success of 7527.55 'Matrix[uc:true][fsm:true] move_directory_contents_deeper': matrix_clean_up_repo && $fn && if test $uc = false && test $fsm = false then git status --porcelain=v1 >.git/expect.$fn else git status --porcelain=v1 >.git/actual.$fn && test_cmp .git/expect.$fn .git/actual.$fn fi + matrix_clean_up_repo + git reset --hard HEAD HEAD is now at 1d1edcb initial + git clean -fd + move_directory_contents_deeper + mkdir T1/_new_ + mv T1/F1 T1/F2 T1/T2 T1/_new_ + test true = false + git status --porcelain=v1 error: read error: Connection reset by peer error: could not read IPC response + test_cmp .git/expect.move_directory_contents_deeper .git/actual.move_directory_contents_deeper + test 2 -ne 2 + eval diff -u "$@" + diff -u .git/expect.move_directory_contents_deeper .git/actual.move_directory_contents_deeper ok 55 - Matrix[uc:true][fsm:true] move_directory_contents_deeper expecting success of 7527.56 'Matrix[uc:true][fsm:true] move_directory_up': matrix_clean_up_repo && $fn && if test $uc = false && test $fsm = false then git status --porcelain=v1 >.git/expect.$fn else git status --porcelain=v1 >.git/actual.$fn && test_cmp .git/expect.$fn .git/actual.$fn fi + matrix_clean_up_repo + git reset --hard HEAD HEAD is now at 1d1edcb initial + git clean -fd Removing T1/_new_/ + move_directory_up + mv T1/T2/T3 T1 + test true = false + git status --porcelain=v1 error: last command exited with $?=141 not ok 56 - Matrix[uc:true][fsm:true] move_directory_up Notice that "error: read error: Connection reset by peer" in the previous, still successful test case! I ran it a couple of times, and saw the same error message in the still successful '42 - Matrix[uc:false][fsm:true] move_directory_contents_deeper' followed by a SIGPIPE caused failure in the next test case. And now that I knew what to look for, I noticed this error message in the very first test failure I reported the other day, which didn't fail because of SIGPIPE, and in that case the error message was printed in the failed test case. And there were a few cases that failed because of SIGPIPE but there were no error messages at all. I can't say what caused these errors, but I doubt that anything segfaulted, because segfaults are logged in syslog, and I haven't found any such syslog entries coinciding with stress testing.