Re: core dump / segfault after 48 hour run

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/30/2013 10:20 AM, Roger Sibert wrote:
> On Mon, Sep 30, 2013 at 12:07 PM, Jens Axboe <axboe@xxxxxxxxx> wrote:
>> On 09/30/2013 07:04 AM, Roger Sibert wrote:
>>> Hello Everyone,
>>>
>>> I was looking to use fio to run full disks writes to a SSD after doing
>>> a secure erase to measure/see how long it takes before the performance
>>> stabilizes.  Give or take after about 48 hours I see this on the
>>> screen.
>>>
>>> B2-058:~/longtermruntime # ./fio.64bit.static longtermruntime-192h.fio
>>> seqwrite-phase: (g=0): rw=write, bs=512K-512K/512K-512K/512K-512K,
>>> ioengine=libaio, iodepth=16
>>> fio-2.1.2-15-gd5603
>>> Starting 1 process
>>> fio: pid=6895, got signal=11ne] [0KB/0KB/0KB /s] [0/0/0 iops] [eta
>>> 06d:07h:05m:31s]
>>>
>>> seqwrite-phase: (groupid=0, jobs=1): err= 0: pid=6895: Sun Sep 29 03:40:38 2013
>>>     lat (usec) : 1000=0.01%
>>>     lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=99.15%
>>>     lat (msec) : 100=0.56%, 250=0.28%, 500=0.01%, 750=0.01%
>>>   cpu          : usr=0.00%, sys=0.00%, ctx=0, majf=0, minf=0
>>>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
>>>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>>>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
>>>      issued    : total=r=0/w=67108865/d=0, short=r=0/w=0/d=0
>>>
>>> Run status group 0 (all jobs):
>>>   WRITE: io=0KB, aggrb=0KB/s, minb=0KB/s, maxb=0KB/s,
>>> mint=144006511329msec, maxt=144006511329msec
>>>
>>> Disk stats (read/write):
>>>   sdb: ios=0/67108865, merge=0/0, ticks=0/2354077568,
>>> in_queue=2353971492, util=100.00%
>>> fio: file hash not empty on exit
>>>
>>> I took a look at one of the core files
>>>
>>> B2-057:~/longtermruntime # gdb core core
>>> GNU gdb (GDB) SUSE (7.0-0.4.16)
>>> Copyright (C) 2009 Free Software Foundation, Inc.
>>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>>> This is free software: you are free to change and redistribute it.
>>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>>> and "show warranty" for details.
>>> This GDB was configured as "x86_64-suse-linux".
>>> For bug reporting instructions, please see:
>>> <http://www.gnu.org/software/gdb/bugs/>...
>>> "/root/longtermruntime/core": not in executable format: File format
>>> not recognized
>>> Missing separate debuginfo for the main executable file
>>> Try: zypper install -C
>>> "debuginfo(build-id)=559375f8a046f376897b4923007bff5b07ecd8d4"
>>> Core was generated by `./fio.64bit.static longtermruntime-216h.fio'.
>>> Program terminated with signal 11, Segmentation fault.
>>> #0  0x000000000040a6c9 in ?? ()
>>>
>>> Is there anything else that I can do prior to help pull out more debug
>>> using gdb prior to restarting/retasking this systems?  My gdb skills
>>> arent that great.
>>
>> I know it's a pain to reproduce (especially after a 48h run), but if you
>> could edit the Makefile and remove the -O3 from the OPTFLAGS, then make
>> clean, make all, and then reproduce. Then the core files will be of more
>> use.
>>
>> For the core files you have now, try and do a 'bt' when you open them so
>> I can see a backtrace. That might be enough to see what is going on.
>>
>> --
>> Jens Axboe
>>
> 
> Let me try that again...  My gdb skills may be bad but it doesnt mean
> I shouldnt recognize I was missing something.
> 
> Changed how I called the core file which should have what you where
> actually asking for.
> 
> B2-057:~/longtermruntime # gdb ./fio.64bit.static ./core
> GNU gdb (GDB) SUSE (7.0-0.4.16)
> Copyright (C) 2009 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-suse-linux".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /root/longtermruntime/fio.64bit.static...done.
> 
> warning: core file may not match specified executable file.
> Core was generated by `./fio.64bit.static longtermruntime-216h.fio'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x000000000040a6c9 in __add_log_sample (iolog=0x872510, val=62,
> ddir=<value optimized out>, bs=<value optimized out>,
>     t=<value optimized out>) at stat.c:1517
> 1517    stat.c: No such file or directory.
>         in stat.c
> (gdb) bt
> #0  0x000000000040a6c9 in __add_log_sample (iolog=0x872510, val=62,
> ddir=<value optimized out>, bs=<value optimized out>,
>     t=<value optimized out>) at stat.c:1517
> #1  0x0000000000440b05 in fio_libaio_queued (nr=1, io_us=0x8929a0,
> td=0x7fe5e312b000) at engines/libaio.c:199
> #2  fio_libaio_commit (nr=1, io_us=0x8929a0, td=0x7fe5e312b000) at
> engines/libaio.c:218
> #3  0x0000000000405385 in td_io_commit (td=0x7fe5e312b000) at ioengines.c:379
> #4  0x000000000040572a in td_io_queue (td=0x7fe5e312b000,
> io_u=0x891f20) at ioengines.c:329
> #5  0x000000000043692f in do_io (td=0x7fe5e312b000) at backend.c:701
> #6  thread_main (td=0x7fe5e312b000) at backend.c:1314
> #7  0x0000000000438447 in fork_main (offset=0, shmid=<value optimized
> out>) at backend.c:1464
> #8  run_threads (offset=0, shmid=<value optimized out>) at backend.c:1726
> #9  0x000000000043889d in fio_backend () at backend.c:1912
> #10 0x00000000004702a4 in __libc_start_main ()
> #11 0x0000000000000000 in ?? ()

OK, that helps a whole lot. So my guess it that you ran out of memory.
Currently fio does not flush out the existing log, it just keeps
appending to it and flushes at the end. This is done to not disturb the
actual data run, but it does mean that for long runs, you can gobble up
a lot of memory...

I will commit something that is a little more defensive so we don't
actually segfault, just stop logging. Then we can look into handling it
better in the future.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux