Re: fio signal 11

Jens Axboe <axboe@xxxxxxxxx> · Tue, 26 Jul 2016 08:17:28 -0600

On 07/26/2016 02:43 AM, Jan Kara wrote:
On Mon 25-07-16 09:21:00, Jens Axboe wrote:
On 07/19/2016 11:08 PM, Jan Kara wrote:
On Thu 16-06-16 09:06:51, Jens Axboe wrote:
On 06/15/2016 04:45 PM, Jan Kara wrote:
On Sat 11-06-16 21:30:00, Jens Axboe wrote:
On 06/11/2016 08:56 PM, Jens Axboe wrote:
On 06/10/2016 12:42 PM, Jeff Furlong wrote:
Good point.  Here is the trace:

[New LWP 59231]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `fio --name=test_job --ioengine=libaio
--direct=1 --rw=write --iodepth=32'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
stat.c:1909
1909        if (!cur_log) {

(gdb) bt
#0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
stat.c:1909
#1  0x000000000042d4df in regrow_logs (td=td@entry=0x7f8277de0000) at
stat.c:1965
#2  0x000000000040ca90 in wait_for_completions
(td=td@entry=0x7f8277de0000, time=time@entry=0x7fffcfb6b300) at
backend.c:446
#3  0x000000000045ade7 in do_io (bytes_done=<synthetic pointer>,
td=0x7f8277de0000) at backend.c:991
#4  thread_main (data=data@entry=0x264d450) at backend.c:1667
#5  0x000000000045cfec in run_threads (sk_out=sk_out@entry=0x0) at
backend.c:2217
#6  0x000000000045d2cd in fio_backend (sk_out=sk_out@entry=0x0) at
backend.c:2349
#7  0x000000000040d09c in main (argc=22, argv=0x7fffcfb6f638,
envp=<optimized out>) at fio.c:63

That looks odd, thanks for reporting this. I'll see if I can get to this
on Monday, if not, it'll have to wait until after my vacation... So
while I appreciate people running -git and finding issues like these
before they show up in a release, might be best to revert back to 2.2.11
until I can get this debugged.

I take that back - continue using -git! Just pull a fresh copy, should
be fixed now.

Jan, the reporter is right, 2.11 works and -git does not. So I just ran
a quick bisect, changing the logging from every second to every 100ms to
make it reproduce faster. I don't have time to look into why yet, so I
just reverted the commit.

commit d7982dd0ab2a1a315b5f9859c67a02414ce6274f
Author: Jan Kara <jack@xxxxxxx>
Date:   Tue May 24 17:03:21 2016 +0200

   fio: Simplify forking of processes

Hum, I've tried reproducing this but failed (I've tried using /dev/ram0 and
/dev/sda4 as devices for fio). Is it somehow dependent on the
device fio works with? I have used commit
54d0a3150d44adca3ee4047fabd85651c6ea2db1 (just before you reverted my
patch) for testing.

On vacation right now, I'll check when I get back. It is possible that it
was just a fluke, since there was another bug there related to shared
memory, but it was predictably crashing at the same time for the bisect.

It doesn't make a lot of sense, however.

Did you have a chance to look into this?

I have not, unfortunately, but I'm suspecting the patch is fine and the
later fix to allocate the cur_log out of the shared pool was the real
fix and that the original patch was fine.

So that's what I'd suspect as well but I'm not able to reproduce even the
original crash so I cannot verify this theory... What's the plan going
forward? Will you re-apply the patch? Frankly, I don't care much, it was
just a small cleanup. I'm just curious whether it was really that other bug
or whether I miss something.

Yes, I think re-applying would be the best way forward. Especially since
that 2.13 was just released, so we'll have a while to iron out any
issues. But I really don't see how it could be the reason for the issue,
I'm guessing it just exacerbated it somehow.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html