On 2014-07-10 00:56, Michael Mattsson wrote:
Hey, I've got 8 identical CentOS 6.5 clients that randomly keeps hanging fio when using --status-interval. I've tried fio 2.1.4 and fio 2.1.10 they both behave the same. I've also tried piping the output to tee instead of redirecting to a file. I also tried --output and specified output file, still same problem. My fio command runs through its tests flawlessly without --status-interval and exits cleanly every time. There could be anywhere from 0 to 5 clients that gets affected. Running strace on the process that seem hung yields the following output: $ strace -p 31055 Process 31055 attached - interrupt to quit futex(0x7f346ede802c, FUTEX_WAIT, 1, NULL
Strange, it must be stuck on the stat mutex, but I don't immediately see why that would happen. Does the attached patch make any difference for you, both in getting rid of the hang but still producing output at the desired intervals?
-- Jens Axboe
diff --git a/stat.c b/stat.c index 979c8100d378..93316a239f7b 100644 --- a/stat.c +++ b/stat.c @@ -1466,11 +1466,12 @@ static void *__show_running_run_stats(void fio_unused *arg) * in the sig handler, but we should be disturbing the system less by just * creating a thread to do it. */ -void show_running_run_stats(void) +int show_running_run_stats(void) { pthread_t thread; - fio_mutex_down(stat_mutex); + if (fio_mutex_down_trylock(stat_mutex)) + return 1; if (!pthread_create(&thread, NULL, __show_running_run_stats, NULL)) { int err; @@ -1479,10 +1480,11 @@ void show_running_run_stats(void) if (err) log_err("fio: DU thread detach failed: %s\n", strerror(err)); - return; + return 0; } fio_mutex_up(stat_mutex); + return 1; } static int status_interval_init; @@ -1531,8 +1533,8 @@ void check_for_running_stats(void) fio_gettime(&status_time, NULL); status_interval_init = 1; } else if (mtime_since_now(&status_time) >= status_interval) { - show_running_run_stats(); - fio_gettime(&status_time, NULL); + if (!show_running_run_stats()) + fio_gettime(&status_time, NULL); return; } } diff --git a/stat.h b/stat.h index 2e46175053e8..82b8e973e4be 100644 --- a/stat.h +++ b/stat.h @@ -218,7 +218,7 @@ extern void show_group_stats(struct group_run_stats *rs); extern int calc_thread_status(struct jobs_eta *je, int force); extern void display_thread_status(struct jobs_eta *je); extern void show_run_stats(void); -extern void show_running_run_stats(void); +extern int show_running_run_stats(void); extern void check_for_running_stats(void); extern void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src, int nr); extern void sum_group_stats(struct group_run_stats *dst, struct group_run_stats *src);