On 09/30/2013 12:13 PM, Jens Axboe wrote: > On 09/30/2013 10:20 AM, Roger Sibert wrote: >> On Mon, Sep 30, 2013 at 12:07 PM, Jens Axboe <axboe@xxxxxxxxx> wrote: >>> On 09/30/2013 07:04 AM, Roger Sibert wrote: >>>> Hello Everyone, >>>> >>>> I was looking to use fio to run full disks writes to a SSD after doing >>>> a secure erase to measure/see how long it takes before the performance >>>> stabilizes. Give or take after about 48 hours I see this on the >>>> screen. >>>> >>>> B2-058:~/longtermruntime # ./fio.64bit.static longtermruntime-192h.fio >>>> seqwrite-phase: (g=0): rw=write, bs=512K-512K/512K-512K/512K-512K, >>>> ioengine=libaio, iodepth=16 >>>> fio-2.1.2-15-gd5603 >>>> Starting 1 process >>>> fio: pid=6895, got signal=11ne] [0KB/0KB/0KB /s] [0/0/0 iops] [eta >>>> 06d:07h:05m:31s] >>>> >>>> seqwrite-phase: (groupid=0, jobs=1): err= 0: pid=6895: Sun Sep 29 03:40:38 2013 >>>> lat (usec) : 1000=0.01% >>>> lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=99.15% >>>> lat (msec) : 100=0.56%, 250=0.28%, 500=0.01%, 750=0.01% >>>> cpu : usr=0.00%, sys=0.00%, ctx=0, majf=0, minf=0 >>>> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0% >>>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% >>>> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0% >>>> issued : total=r=0/w=67108865/d=0, short=r=0/w=0/d=0 >>>> >>>> Run status group 0 (all jobs): >>>> WRITE: io=0KB, aggrb=0KB/s, minb=0KB/s, maxb=0KB/s, >>>> mint=144006511329msec, maxt=144006511329msec >>>> >>>> Disk stats (read/write): >>>> sdb: ios=0/67108865, merge=0/0, ticks=0/2354077568, >>>> in_queue=2353971492, util=100.00% >>>> fio: file hash not empty on exit >>>> >>>> I took a look at one of the core files >>>> >>>> B2-057:~/longtermruntime # gdb core core >>>> GNU gdb (GDB) SUSE (7.0-0.4.16) >>>> Copyright (C) 2009 Free Software Foundation, Inc. >>>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> >>>> This is free software: you are free to change and redistribute it. >>>> There is NO WARRANTY, to the extent permitted by law. Type "show copying" >>>> and "show warranty" for details. >>>> This GDB was configured as "x86_64-suse-linux". >>>> For bug reporting instructions, please see: >>>> <http://www.gnu.org/software/gdb/bugs/>... >>>> "/root/longtermruntime/core": not in executable format: File format >>>> not recognized >>>> Missing separate debuginfo for the main executable file >>>> Try: zypper install -C >>>> "debuginfo(build-id)=559375f8a046f376897b4923007bff5b07ecd8d4" >>>> Core was generated by `./fio.64bit.static longtermruntime-216h.fio'. >>>> Program terminated with signal 11, Segmentation fault. >>>> #0 0x000000000040a6c9 in ?? () >>>> >>>> Is there anything else that I can do prior to help pull out more debug >>>> using gdb prior to restarting/retasking this systems? My gdb skills >>>> arent that great. >>> >>> I know it's a pain to reproduce (especially after a 48h run), but if you >>> could edit the Makefile and remove the -O3 from the OPTFLAGS, then make >>> clean, make all, and then reproduce. Then the core files will be of more >>> use. >>> >>> For the core files you have now, try and do a 'bt' when you open them so >>> I can see a backtrace. That might be enough to see what is going on. >>> >>> -- >>> Jens Axboe >>> >> >> Let me try that again... My gdb skills may be bad but it doesnt mean >> I shouldnt recognize I was missing something. >> >> Changed how I called the core file which should have what you where >> actually asking for. >> >> B2-057:~/longtermruntime # gdb ./fio.64bit.static ./core >> GNU gdb (GDB) SUSE (7.0-0.4.16) >> Copyright (C) 2009 Free Software Foundation, Inc. >> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> >> This is free software: you are free to change and redistribute it. >> There is NO WARRANTY, to the extent permitted by law. Type "show copying" >> and "show warranty" for details. >> This GDB was configured as "x86_64-suse-linux". >> For bug reporting instructions, please see: >> <http://www.gnu.org/software/gdb/bugs/>... >> Reading symbols from /root/longtermruntime/fio.64bit.static...done. >> >> warning: core file may not match specified executable file. >> Core was generated by `./fio.64bit.static longtermruntime-216h.fio'. >> Program terminated with signal 11, Segmentation fault. >> #0 0x000000000040a6c9 in __add_log_sample (iolog=0x872510, val=62, >> ddir=<value optimized out>, bs=<value optimized out>, >> t=<value optimized out>) at stat.c:1517 >> 1517 stat.c: No such file or directory. >> in stat.c >> (gdb) bt >> #0 0x000000000040a6c9 in __add_log_sample (iolog=0x872510, val=62, >> ddir=<value optimized out>, bs=<value optimized out>, >> t=<value optimized out>) at stat.c:1517 >> #1 0x0000000000440b05 in fio_libaio_queued (nr=1, io_us=0x8929a0, >> td=0x7fe5e312b000) at engines/libaio.c:199 >> #2 fio_libaio_commit (nr=1, io_us=0x8929a0, td=0x7fe5e312b000) at >> engines/libaio.c:218 >> #3 0x0000000000405385 in td_io_commit (td=0x7fe5e312b000) at ioengines.c:379 >> #4 0x000000000040572a in td_io_queue (td=0x7fe5e312b000, >> io_u=0x891f20) at ioengines.c:329 >> #5 0x000000000043692f in do_io (td=0x7fe5e312b000) at backend.c:701 >> #6 thread_main (td=0x7fe5e312b000) at backend.c:1314 >> #7 0x0000000000438447 in fork_main (offset=0, shmid=<value optimized >> out>) at backend.c:1464 >> #8 run_threads (offset=0, shmid=<value optimized out>) at backend.c:1726 >> #9 0x000000000043889d in fio_backend () at backend.c:1912 >> #10 0x00000000004702a4 in __libc_start_main () >> #11 0x0000000000000000 in ?? () > > OK, that helps a whole lot. So my guess it that you ran out of memory. > Currently fio does not flush out the existing log, it just keeps > appending to it and flushes at the end. This is done to not disturb the > actual data run, but it does mean that for long runs, you can gobble up > a lot of memory... > > I will commit something that is a little more defensive so we don't > actually segfault, just stop logging. Then we can look into handling it > better in the future. I committed this: http://git.kernel.dk/?p=fio.git;a=commit;h=3c568239a319087a965b06bc2ed94d058810100f to handle the failure a bit more gracefully at least. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html