On 2012-06-20 01:05, Erik Lattimore wrote: > Lately it seems like we've been hitting this more frequently, so I figured I'd file a bug. Fio starts up a thread running the function disk_thread_main, which periodically calls update_io_ticks, which calls update_io_tick_disk on each entry in a circular linked list. The function disk_thread_main returns when the global variable "threads" is set to null, but it's only checked a couple of times in the loop. > > The main thread runs the test and exits, and has registered an atexit handler free_shm. This routine sets "threads" to null and frees up storage, including the storage where the linked list used by update_io_ticks is stored. > > Occasionally, somehow, update_io_tick_disk winds up getting called with a null pointer and crashing. The problem may be exacerbated when memory is tight. Here's the backtrace of the core dump: > > Program terminated with signal 11, Segmentation fault. > #0 update_io_tick_disk (du=<optimized out>) at diskutil.c:80 > 80 if (!du->users) > (gdb) t apply all bt > > Thread 2 (Thread 0x7faab680b700 (LWP 23148)): > #0 0x00007faab58df377 in shmdt () from /lib64/libc.so.6 > #1 0x000000000040b98d in free_shm () at init.c:231 > #2 0x00007faab583b7f5 in __run_exit_handlers () from /lib64/libc.so.6 > #3 0x00007faab583b845 in exit () from /lib64/libc.so.6 > #4 0x00007faab5824c3d in __libc_start_main () from /lib64/libc.so.6 > #5 0x0000000000408ed9 in _start () > > Thread 1 (Thread 0x7faab32dd700 (LWP 23149)): > #0 update_io_tick_disk (du=<optimized out>) at diskutil.c:80 > #1 update_io_ticks () at diskutil.c:114 > #2 0x000000000043b303 in disk_thread_main (data=<optimized out>) at backend.c:1589 > #3 0x00007faab61907b6 in start_thread () from /lib64/libpthread.so.0 > #4 0x00007faab58dd9cd in clone () from /lib64/libc.so.6 > #5 0x0000000000000000 in ?? () > (gdb) q-- This is clearly a race in how the disk util thread is shut down and the structures freed. I'll take a look at a fix. It would be useful if you told me how you are hitting this most easily, as I don't recall seeing it. Would make me more confident in a fix. Also, are you sure it's threads == NULL, and not the du's themselves being freed? They are in separate storage. It might be a good idea to have diskutil.c:free_disk_util() signal and wait for the disk util thread to shutdown before going further. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html