Re: Segfault in fio_libaio_getevents

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16 August 2016 at 15:44, Jens Axboe <axboe@xxxxxxxxx> wrote:
>
> On 08/15/2016 11:53 PM, Sitsofe Wheeler wrote:
>>
>> On 16 August 2016 at 03:12, Jens Axboe <axboe@xxxxxxxxx> wrote:
>>>
>>> On 08/13/2016 04:26 AM, Sitsofe Wheeler wrote:
>>>>
>>>> With fio-2.12-3-g8a09 I've seen the occasional segfault while running
>>>> a job like this:
>>>>
>>>> ./fio --allow_file_create=0 --randseed=1 --rw=randread --direct=1
>>>> --iodepth=16  --ioengine=libaio --bs=64k --time_based=1 --runtime=12h
>>>> --random_distribution=zipf:0.8 --name=go --filename  /dev/sde
>>>>
>>>> Here's a backtrace:
>>>> Thread 1 (Thread 0x7f8518911740 (LWP 4078)):
>>>> #0  fio_libaio_getevents (td=0x7f850076fa80, min=15, max=15,
>>>> t=<optimized out>) at engines/libaio.c:161
>>>> #1  0x000000000041a493 in td_io_getevents (td=td@entry=0x7f850076fa80,
>>>> min=15, max=<optimized out>, t=0x0) at ioengines.c:241
>>
>> [...]
>>>
>>> Huh, weird. Is this a new regression?
>>
>> I don't think it's new. I've seen this with Fedora's older
>> fio-2.6-1.fc24.x86_64 RPM:
>> Stack trace of thread 8294:
>> #0  0x000000000046b458 fio_libaio_getevents (fio)
>> #1  0x000000000041b9a3 td_io_getevents (fio)
>> #2  0x0000000000440509 io_u_queued_complete (fio)
>> #3  0x000000000040cec4 cleanup_pending_aio (fio)
>> #4  0x000000000045f658 thread_main (fio)
>> #5  0x00000000004609ce run_threads (fio)
>> #6  0x0000000000460e4d fio_backend (fio)
>> #7  0x000000000040ddea main (fio)
>> #8  0x00007fa7f1013731 __libc_start_main (libc.so.6)
>> #9  0x000000000040de39 _start (fio)
>>
>> I don't think it happens in the normal course of operation but only
>> when a device mapper device is somehow removed while fio is still
>> using it? So far I've been unable to identify the steps that lead up
>> to it.
>
> It makes very little sense. td->eo is NULL, which would only happen
> after the process has exited and called close_ioengine(), yet your trace
> is off the main IO path (so the thread definitely hasn't exited). Are
> you positive that your core dump and fio executable/symbols match up?

This is happening more regularly with fio-2.14-40-ge291c which was
statically built with the symbols inside. The key seems to be doing
I/O to an iSCSI connection where the target locks up (causing the I/O
to stall), then perhaps a minute after the hang has happened I try and
Ctrl-C fio. A bit after that fio says this:

fio: terminating on signal 2
fio: job 'go' (state=5) hasn't exited in 60 seconds, it appears to be
stuck. Doing forceful exit of this job.
fio: job 'go2' (state=5) hasn't exited in 60 seconds, it appears to be
stuck. Doing forceful exit of this job.
fio: job 'go3' (state=5) hasn't exited in 60 seconds, it appears to be
stuck. Doing forceful exit of this job.

the usual summary output is produced

Disk stats (read/write):
  sde: ios=519174/0, merge=0/0, ticks=3647890/0, in_queue=7226900, util=99.44%
  sdj: ios=449838/0, merge=0/0, ticks=3626910/0, in_queue=7205840, util=99.29%
  sdh: ios=481590/0, merge=0/0, ticks=3629520/0, in_queue=7208730, util=99.39%
fio: file hash not empty on exit
# fio: io_u error on file /dev/sde: Input/output error: read
offset=204995584, buflen=67584
fio: io_u error on file /dev/sdh: Input/output error: read
offset=9959732736, buflen=75776
fio: io_u error on file /dev/sdj: Input/output error: read
offset=803390594048, buflen=84480

but the cores are also made.

Backtrace says this:
Program terminated with signal SIGSEGV, Segmentation fault.
#0  fio_libaio_getevents (td=0x7fb0711e0ad0, min=0, max=1,
t=<optimized out>) at engines/libaio.c:160
#1  0x000000000040f405 in td_io_getevents (td=td@entry=0x7fb0711e0ad0,
min=<optimized out>, max=<optimized out>, t=<optimized out>) at
ioengines.c:237
#2  0x0000000000431bf7 in io_u_queued_complete
(td=td@entry=0x7fb0711e0ad0, min_evts=<optimized out>,
min_evts@entry=0) at io_u.c:1969
#3  0x000000000040086b in cleanup_pending_aio
(td=td@entry=0x7fb0711e0ad0) at backend.c:256
#4  0x0000000000451afb in do_io (bytes_done=<synthetic pointer>,
td=0x7fb0711e0ad0) at backend.c:1096
#5  thread_main (data=data@entry=0x1af7230) at backend.c:1714
#6  0x0000000000452c4c in run_threads (sk_out=sk_out@entry=0x0) at
backend.c:2268
#7  0x0000000000452f2d in fio_backend (sk_out=sk_out@entry=0x0) at
backend.c:2400
#8  0x00000000004030dc in main (argc=20, argv=0x7fff55c57788,
envp=<optimized out>) at fio.c:65

-- 
Sitsofe | http://sucs.org/~sits/
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux