Re: core dump / segfault after 48 hour run

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 30, 2013 at 12:07 PM, Jens Axboe <axboe@xxxxxxxxx> wrote:
> On 09/30/2013 07:04 AM, Roger Sibert wrote:
>> Hello Everyone,
>>
>> I was looking to use fio to run full disks writes to a SSD after doing
>> a secure erase to measure/see how long it takes before the performance
>> stabilizes.  Give or take after about 48 hours I see this on the
>> screen.
>>
>> B2-058:~/longtermruntime # ./fio.64bit.static longtermruntime-192h.fio
>> seqwrite-phase: (g=0): rw=write, bs=512K-512K/512K-512K/512K-512K,
>> ioengine=libaio, iodepth=16
>> fio-2.1.2-15-gd5603
>> Starting 1 process
>> fio: pid=6895, got signal=11ne] [0KB/0KB/0KB /s] [0/0/0 iops] [eta
>> 06d:07h:05m:31s]
>>
>> seqwrite-phase: (groupid=0, jobs=1): err= 0: pid=6895: Sun Sep 29 03:40:38 2013
>>     lat (usec) : 1000=0.01%
>>     lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=99.15%
>>     lat (msec) : 100=0.56%, 250=0.28%, 500=0.01%, 750=0.01%
>>   cpu          : usr=0.00%, sys=0.00%, ctx=0, majf=0, minf=0
>>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
>>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
>>      issued    : total=r=0/w=67108865/d=0, short=r=0/w=0/d=0
>>
>> Run status group 0 (all jobs):
>>   WRITE: io=0KB, aggrb=0KB/s, minb=0KB/s, maxb=0KB/s,
>> mint=144006511329msec, maxt=144006511329msec
>>
>> Disk stats (read/write):
>>   sdb: ios=0/67108865, merge=0/0, ticks=0/2354077568,
>> in_queue=2353971492, util=100.00%
>> fio: file hash not empty on exit
>>
>> I took a look at one of the core files
>>
>> B2-057:~/longtermruntime # gdb core core
>> GNU gdb (GDB) SUSE (7.0-0.4.16)
>> Copyright (C) 2009 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "x86_64-suse-linux".
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>...
>> "/root/longtermruntime/core": not in executable format: File format
>> not recognized
>> Missing separate debuginfo for the main executable file
>> Try: zypper install -C
>> "debuginfo(build-id)=559375f8a046f376897b4923007bff5b07ecd8d4"
>> Core was generated by `./fio.64bit.static longtermruntime-216h.fio'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  0x000000000040a6c9 in ?? ()
>>
>> Is there anything else that I can do prior to help pull out more debug
>> using gdb prior to restarting/retasking this systems?  My gdb skills
>> arent that great.
>
> I know it's a pain to reproduce (especially after a 48h run), but if you
> could edit the Makefile and remove the -O3 from the OPTFLAGS, then make
> clean, make all, and then reproduce. Then the core files will be of more
> use.
>
> For the core files you have now, try and do a 'bt' when you open them so
> I can see a backtrace. That might be enough to see what is going on.
>
> --
> Jens Axboe
>

Hello Jens,

I should be able to do the rebuild and run and give or take a few days
get the results.

Heres the results of the bt.

#0  0x000000000040a6c9 in ?? ()
(gdb) bt
#0  0x000000000040a6c9 in ?? ()
#1  0x0000000000891f20 in ?? ()
#2  0x0000000008d2d7f5 in ?? ()
#3  0x00007fe500080000 in ?? ()
#4  0x00007fe500000001 in ?? ()
#5  0x00007fe5e312b000 in ?? ()
#6  0x00000000008929a0 in ?? ()
#7  0x0000000000870eb0 in ?? ()
#8  0x0000000000440b05 in ?? ()
#9  0x00007fff4efcfb60 in ?? ()
#10 0x0000000000000008 in ?? ()
#11 0x000000000010bac4 in ?? ()
#12 0x00000000000a8c7c in ?? ()
#13 0x00007fe5e312b000 in ?? ()
#14 0x00007fe5e312b000 in ?? ()
#15 0x0000000000891f20 in ?? ()
#16 0x00007fe5e312b000 in ?? ()
#17 0x00007fe5e312ffa0 in ?? ()
#18 0x00007fe5e312ffb0 in ?? ()
---Type <return> to continue, or q <return> to quit---
#19 0x00007fe5e312fcb8 in ?? ()
#20 0x0000000000405385 in ?? ()
#21 0x00000000005388bb in ?? ()
#22 0x00000000005388cb in ?? ()
#23 0x0000000000539fe2 in ?? ()
#24 0x0000000000000000 in ?? ()

Thanks,
Roger
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux