RE: fio successful completions followed by segfault

Joseph DeVincentis <jdevincentis@xxxxxxxxxxx> · Thu, 21 Nov 2019 13:59:52 +0000

Hi,

I modified both fio 3.9 and 3.16, adding logic to run_threads in backend.c, to use pthread-join to ensure that the worker threads have exited before continuing.
This has run > 24 hours without any issues.
Following is the git diff for backend.c

--Joe

diff --git a/backend.c b/backend.c
index 1c33940..646e783 100644
--- a/backend.c
+++ b/backend.c
@@ -2354,10 +2354,6 @@ reap:
 					break;
 				}
 				fd = NULL;
-				ret = pthread_detach(td->thread);
-				if (ret)
-					log_err("pthread_detach: %s",
-							strerror(ret));
 			} else {
 				pid_t pid;
 				dprint(FD_PROCESS, "will fork\n");
@@ -2455,6 +2451,18 @@ reap:
 	fio_idle_prof_stop();
 
 	update_io_ticks();
+
+
+	for_each_td(td, i) {
+		int ret;
+		if (td->o.use_thread)
+		{
+			ret = pthread_join(td->thread, NULL);
+			if (ret) {
+				log_err("pthread_join: %s\n", strerror(ret));
+			}
+		}
+	}
 }
 
 static void free_disk_util(void)

-----Original Message-----
From: fio-owner@xxxxxxxxxxxxxxx <fio-owner@xxxxxxxxxxxxxxx> On Behalf Of Joseph DeVincentis
Sent: Tuesday, November 19, 2019 1:29 PM
To: fio@xxxxxxxxxxxxxxx
Subject: fio successful completions followed by segfault

Hello,

I am seeing rare segfaults running "fio" threaded.  In these tests, fio is executed threaded for 60 second run periods, over and over again for 24 hours (simplified).

The fio operations always complete, results are reported, they are always successful, it's just that rarely, the exit status reported by the fio run is 139 / segfault.  The failure rate is perhaps 1 to 3 fio "runs" over a 24 hour period, so its not easy to reproduce.

I am running --threads, --numjobs=4.  (I am running 12 VMs, 2vcpus per, 2Gmem per, and this "fio stress" is one of a few other things going on in each of the VMs).

I think the bug is that the worker threads (thread_main) are detatched, and left to run to completion without coordination by "run_threads()", hence they are in a race with the main code flow.  The main code flow eventually frees the thread specific memory, and I think that's where this segfault comes from.

The job ("thread_main") threads indicate that it is "done" via changing its state to TD_EXITED.  This allows run_threads() to "reap"
their status, and then the code to finishup/exit.

However, "thread_main" makes function calls and even dereferences the thread specific data AFTER it has indicated its state is TD_EXITED.

The final line of "thread_main" is ;
                return (void *) (uintptr_t) td->error;

Since the main code flow doesn't know that a thread is just about to resume and dereference the thread specific data, (it doesn't know anything about the "main_thread" threads), it can't guarantee that it is safe to free the thread specific data (which it does via atexit()).

I have captured a few of these segfaults in a modified fio installing a backtrace dump on segv handler, and the segfault has always been on the last line of thread_main, e.g. dereferencing the thread specific pointer "td".

I have also captured one failure under gdb, automating the invocation of gdb/fio with an expect script.  This script dumped the CPU registers and the register holding the "td" pointer looked "sane", yet it still segfaulted.  

This can be explained by a thread that has just resumed, so its registers hold what look like valid virtual addresses to the thread specific memory (td pointer), but that memory has been freed by atexit(), and is no longer mapped.

The version of fio is 3.9, however the latest version (3.16) has the same code, so I expect it could fail the same way if things line up just right.

      fio --version 
      fio-3.9

Solutions:

If the threads are going to be detatched, and "completion" is indicated via TD_EXITED, then I don't think thread_main should do anything after "td_set_runstate(td, TD_EXITED);" to avoid any race conditions.

A different option would be to leave the "thread_main" threads attached, and add a small loop (for_each_td) that joins the pthreads and guarantees no race condition exists between the worker threads and the main code flow.

Experiments:

1) In backend.c:thread_main, I captured td->error into a local var, prior to "thread_main" indicating TD_EXITED, and changed the return to return the local var, and ran 72 hours without failure in a scenario that fails at least 3 times per 24 hours.

2) I changed "thread_main" to NOT detatch the pthreads, and added explicit pthread_join() to the end of "run_threads()", and I have yet to see a failure (obviously I need to run for a very long time to be certain, my typical run periods are 24 hours per scenario).

These are my parameters to fio.

fio --name=randrw --minimal --readwrite=randrw \
    --rwmixwrite=0 --bs=8192 --invalidate=1 --end_fsync=0 \
    --directory=/home/disk_stress --size=1048676 \
    --time_based --runtime=10 --group_reporting \
    --numjobs=4  --norandommap --randrepeat=0 --iodepth=8 \
    --direct=1 --thread

Thanks,
Joe