> -----Original Message----- > From: Jens Axboe [mailto:axboe@xxxxxxxxx] > Sent: Tuesday, 16 December, 2014 4:43 PM ... > > I think I figured out what was going in, another user reported a similar > issue. Since you have it running, can you try and gdb attach to one of > the fio jobs? Then add a breakpoint in backend.c:__update_tv_cache(), > and print td->tv_cache after each call. Just do that a handful of times, > enough so we can see if it's incrementing or if it's stuck at the same > value. > That is a static inline function, and its caller do_verify also got inlined. I tried a breakpoint by line number in thread_main; although IOs resumed, it doesn't trigger. (gdb) b backend.c:1536 Breakpoint 2 at 0x44c67b: file backend.c, line 1536. (gdb) cont objdump -drS around that address is: check_update_rusage(td); if (td->error) 44c66e: 83 bb 90 4e 00 00 00 cmpl $0x0,0x4e90(%rbx) 44c675: 0f 84 0d 01 00 00 je 44c788 <thread_main+0x1538> fio_gettime(&td->start, NULL); do_verify(td, verify_bytes); fio_mutex_down(stat_mutex); 44c67b: 48 8b 3d 16 a4 2d 00 mov 0x2da416(%rip),%rdi # 726a98 <stat_mutex> 44c682: e8 a9 40 fe ff callq 430730 <fio_mutex_down> td->ts.runtime[DDIR_READ] += mtime_since_now(&td->start); 44c687: 4c 8b bb 44 4c 00 00 mov 0x4c44(%rbx),%r15 44c68e: 48 89 ef mov %rbp,%rdi 44c691: e8 9a e0 fb ff callq 40a730 <mtime_since_now> fio_gettime(&td->start, NULL); 44c696: 31 f6 xor %esi,%esi 44c698: 48 89 ef mov %rbp,%rdi fio_gettime(&td->start, NULL); do_verify(td, verify_bytes); fio_mutex_down(stat_mutex); td->ts.runtime[DDIR_READ] += mtime_since_now(&td->start); 44c69b: 4c 01 f8 add %r15,%rax 44c69e: 48 89 83 44 4c 00 00 mov %rax,0x4c44(%rbx) fio_gettime(&td->start, NULL); 44c6a5: e8 56 de fb ff callq 40a500 <fio_gettime> fio_mutex_up(stat_mutex); 44c6aa: 48 8b 3d e7 a3 2d 00 mov 0x2da3e7(%rip),%rdi # 726a98 <stat_mutex> 44c6b1: e8 0a 40 fe ff callq 4306c0 <fio_mutex_up> I printed that value for thread 2, let it run for a few seconds, then printed again, in case that helps (tv_usec increments, but tv_sec does not). (gdb) thread 2 [Switching to thread 2 (Thread 0x7fa92bf87700 (LWP 6733))]#0 0x0000003657600667 in io_submit () from /lib64/libaio.so.1 (gdb) bt #0 0x0000003657600667 in io_submit () from /lib64/libaio.so.1 #1 0x0000000000457058 in fio_libaio_commit (td=0x7fa9a0dd1860) at engines/libaio.c:255 #2 0x000000000040b395 in td_io_commit (td=0x7fa9a0dd1860) at ioengines.c:396 #3 0x000000000040bea1 in td_io_queue (td=0x7fa9a0dd1860, io_u=0x7fa8e401c980) at ioengines.c:343 #4 0x000000000044a75d in do_io (td=0x7fa9a0dd1860) at backend.c:792 #5 0x000000000044c209 in thread_main (data=0x7fa9a0dd1860) at backend.c:1504 #6 0x0000003974c079d1 in start_thread () from /lib64/libpthread.so.0 #7 0x00000039748e8b7d in clone () from /lib64/libc.so.6 (gdb) print td No symbol "td" in current context. (gdb) select-frame 5 (gdb) print td->tv_cache $51 = {tv_sec = 1099511, tv_usec = 641885} (gdb) cont Continuing. ^C Program received signal SIGINT, Interrupt. [Switching to Thread 0x7fa9a99b1720 (LWP 6508)] 0x00000039748acced in nanosleep () from /lib64/libc.so.6 (gdb) thread 2 [Switching to thread 2 (Thread 0x7fa92bf87700 (LWP 6733))]#0 0x0000003657600667 in io_submit () from /lib64/libaio.so.1 (gdb) bt #0 0x0000003657600667 in io_submit () from /lib64/libaio.so.1 #1 0x0000000000457058 in fio_libaio_commit (td=0x7fa9a0dd1860) at engines/libaio.c:255 #2 0x000000000040b47a in td_io_getevents (td=0x7fa9a0dd1860, min=<value optimized out>, max=<value optimized out>, t=<value optimized out>) at ioengines.c:238 #3 0x000000000042b442 in io_u_queued_complete (td=0x7fa9a0dd1860, min_evts=<value optimized out>, bytes=0x7fa92bf86d00) at io_u.c:1793 #4 0x000000000044a45e in do_io (td=0x7fa9a0dd1860) at backend.c:889 #5 0x000000000044c209 in thread_main (data=0x7fa9a0dd1860) at backend.c:1504 #6 0x0000003974c079d1 in start_thread () from /lib64/libpthread.so.0 #7 0x00000039748e8b7d in clone () from /lib64/libc.so.6 (gdb) select-frame 5 (gdb) print td->tv_cache $52 = {tv_sec = 1099511, tv_usec = 971364} and again after about a minute: (gdb) cont Continuing. ^C Program received signal SIGINT, Interrupt. [Switching to Thread 0x7fa9a99b1720 (LWP 6508)] 0x00000039748acced in nanosleep () from /lib64/libc.so.6 (gdb) thread 2 [Switching to thread 2 (Thread 0x7fa92bf87700 (LWP 6733))]#0 0x0000003657600667 in io_submit () from /lib64/libaio.so.1 (gdb) bt #0 0x0000003657600667 in io_submit () from /lib64/libaio.so.1 #1 0x0000000000457058 in fio_libaio_commit (td=0x7fa9a0dd1860) at engines/libaio.c:255 #2 0x000000000040b395 in td_io_commit (td=0x7fa9a0dd1860) at ioengines.c:396 #3 0x000000000040bea1 in td_io_queue (td=0x7fa9a0dd1860, io_u=0x7fa8e401fe80) at ioengines.c:343 #4 0x000000000044a75d in do_io (td=0x7fa9a0dd1860) at backend.c:792 #5 0x000000000044c209 in thread_main (data=0x7fa9a0dd1860) at backend.c:1504 #6 0x0000003974c079d1 in start_thread () from /lib64/libpthread.so.0 #7 0x00000039748e8b7d in clone () from /lib64/libc.so.6 (gdb) select-frame 5 (gdb) print td->tv_cache $53 = {tv_sec = 1099511, tv_usec = 578033} -- To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html