Re: Observing an fio hang with RNBD device in the latest v5.10.78 Linux kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 15, 2021 at 02:01:32PM +0100, Haris Iqbal wrote:
> On Wed, Nov 10, 2021 at 2:39 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote:
> >
> > Hello Haris,
> >
> > On Tue, Nov 09, 2021 at 10:32:32AM +0100, Haris Iqbal wrote:
> > > Hi,
> > >
> > > We are observing an fio hang with the latest v5.10.78 Linux kernel
> > > version with RNBD. The setup is as follows,
> > >
> > > On the server side, 16 nullblk devices.
> > > On the client side, map those 16 block devices through RNBD-RTRS.
> > > Change the scheduler for those RNBD block devices to mq-deadline.
> > >
> > > Run fios with the following configuration.
> > >
> > > [global]
> > > description=Emulation of Storage Server Access Pattern
> > > bssplit=512/20:1k/16:2k/9:4k/12:8k/19:16k/10:32k/8:64k/4
> > > fadvise_hint=0
> > > rw=randrw:2
> > > direct=1
> > > random_distribution=zipf:1.2
> > > time_based=1
> > > runtime=60
> > > ramp_time=1
> > > ioengine=libaio
> > > iodepth=128
> > > iodepth_batch_submit=128
> > > iodepth_batch_complete=128
> > > numjobs=1
> > > group_reporting
> > >
> > >
> > > [job1]
> > > filename=/dev/rnbd0
> > > [job2]
> > > filename=/dev/rnbd1
> > > [job3]
> > > filename=/dev/rnbd2
> > > [job4]
> > > filename=/dev/rnbd3
> > > [job5]
> > > filename=/dev/rnbd4
> > > [job6]
> > > filename=/dev/rnbd5
> > > [job7]
> > > filename=/dev/rnbd6
> > > [job8]
> > > filename=/dev/rnbd7
> > > [job9]
> > > filename=/dev/rnbd8
> > > [job10]
> > > filename=/dev/rnbd9
> > > [job11]
> > > filename=/dev/rnbd10
> > > [job12]
> > > filename=/dev/rnbd11
> > > [job13]
> > > filename=/dev/rnbd12
> > > [job14]
> > > filename=/dev/rnbd13
> > > [job15]
> > > filename=/dev/rnbd14
> > > [job16]
> > > filename=/dev/rnbd15
> > >
> > > Some of the fio threads hangs and the fio never finishes.
> > >
> > > fio fio.ini
> > > job1: (g=0): rw=randrw, bs=(R) 512B-64.0KiB, (W) 512B-64.0KiB, (T)
> > > 512B-64.0KiB, ioengine=libaio, iodepth=128
> > > job2: (g=0): rw=randrw, bs=(R) 512B-64.0KiB, (W) 512B-64.0KiB, (T)
> > > 512B-64.0KiB, ioengine=libaio, iodepth=128
> > > job3: (g=0): rw=randrw, bs=(R) 512B-64.0KiB, (W) 512B-64.0KiB, (T)
> > > 512B-64.0KiB, ioengine=libaio, iodepth=128
> > > job4: (g=0): rw=randrw, bs=(R) 512B-64.0KiB, (W) 512B-64.0KiB, (T)
> > > 512B-64.0KiB, ioengine=libaio, iodepth=128
> > > job5: (g=0): rw=randrw, bs=(R) 512B-64.0KiB, (W) 512B-64.0KiB, (T)
> > > 512B-64.0KiB, ioengine=libaio, iodepth=128
> > > job6: (g=0): rw=randrw, bs=(R) 512B-64.0KiB, (W) 512B-64.0KiB, (T)
> > > 512B-64.0KiB, ioengine=libaio, iodepth=128
> > > job7: (g=0): rw=randrw, bs=(R) 512B-64.0KiB, (W) 512B-64.0KiB, (T)
> > > 512B-64.0KiB, ioengine=libaio, iodepth=128
> > > job8: (g=0): rw=randrw, bs=(R) 512B-64.0KiB, (W) 512B-64.0KiB, (T)
> > > 512B-64.0KiB, ioengine=libaio, iodepth=128
> > > job9: (g=0): rw=randrw, bs=(R) 512B-64.0KiB, (W) 512B-64.0KiB, (T)
> > > 512B-64.0KiB, ioengine=libaio, iodepth=128
> > > job10: (g=0): rw=randrw, bs=(R) 512B-64.0KiB, (W) 512B-64.0KiB, (T)
> > > 512B-64.0KiB, ioengine=libaio, iodepth=128
> > > job11: (g=0): rw=randrw, bs=(R) 512B-64.0KiB, (W) 512B-64.0KiB, (T)
> > > 512B-64.0KiB, ioengine=libaio, iodepth=128
> > > job12: (g=0): rw=randrw, bs=(R) 512B-64.0KiB, (W) 512B-64.0KiB, (T)
> > > 512B-64.0KiB, ioengine=libaio, iodepth=128
> > > job13: (g=0): rw=randrw, bs=(R) 512B-64.0KiB, (W) 512B-64.0KiB, (T)
> > > 512B-64.0KiB, ioengine=libaio, iodepth=128
> > > job14: (g=0): rw=randrw, bs=(R) 512B-64.0KiB, (W) 512B-64.0KiB, (T)
> > > 512B-64.0KiB, ioengine=libaio, iodepth=128
> > > job15: (g=0): rw=randrw, bs=(R) 512B-64.0KiB, (W) 512B-64.0KiB, (T)
> > > 512B-64.0KiB, ioengine=libaio, iodepth=128
> > > job16: (g=0): rw=randrw, bs=(R) 512B-64.0KiB, (W) 512B-64.0KiB, (T)
> > > 512B-64.0KiB, ioengine=libaio, iodepth=128
> > > fio-3.12
> > > Starting 16 processes
> > > Jobs: 16 (f=12):
> > > [m(3),/(2),m(5),/(1),m(1),/(1),m(3)][0.0%][r=130MiB/s,w=130MiB/s][r=14.7k,w=14.7k
> > > IOPS][eta 04d:07h:4
> > > Jobs: 15 (f=11):
> > > [m(3),/(2),m(5),/(1),_(1),/(1),m(3)][51.2%][r=7395KiB/s,w=6481KiB/s][r=770,w=766
> > > IOPS][eta 01m:01s]
> > > Jobs: 15 (f=11): [m(3),/(2),m(5),/(1),_(1),/(1),m(3)][52.7%][eta 01m:01s]
> > >
> > > We checked the block devices, and there are requests waiting in their
> > > fifo (not on all devices, just few whose corresponding fio threads are
> > > hung).
> > >
> > > $ cat /sys/kernel/debug/block/rnbd0/sched/read_fifo_list
> > > 00000000ce398aec {.op=READ, .cmd_flags=,
> > > .rq_flags=SORTED|ELVPRIV|IO_STAT|HASHED, .state=idle, .tag=-1,
> > > .internal_tag=209}
> > > 000000005ec82450 {.op=READ, .cmd_flags=,
> > > .rq_flags=SORTED|ELVPRIV|IO_STAT|HASHED, .state=idle, .tag=-1,
> > > .internal_tag=210}
> > >
> > > $ cat /sys/kernel/debug/block/rnbd0/sched/write_fifo_list
> > > 000000000c1557f5 {.op=WRITE, .cmd_flags=SYNC|IDLE,
> > > .rq_flags=SORTED|ELVPRIV|IO_STAT|HASHED, .state=idle, .tag=-1,
> > > .internal_tag=195}
> > > 00000000fc6bfd98 {.op=WRITE, .cmd_flags=SYNC|IDLE,
> > > .rq_flags=SORTED|ELVPRIV|IO_STAT|HASHED, .state=idle, .tag=-1,
> > > .internal_tag=199}
> > > 000000009ef7c802 {.op=WRITE, .cmd_flags=SYNC|IDLE,
> > > .rq_flags=SORTED|ELVPRIV|IO_STAT|HASHED, .state=idle, .tag=-1,
> > > .internal_tag=217}
> >
> > Can you post the whole debugfs log for rnbd0?
> >
> > (cd /sys/kernel/debug/block/rnbd0 && find . -type f -exec grep -aH . {} \;)
> 
> Attached the logfile.

logfile just shows that there are requests in scheduler queue.

> 
> >
> > >
> > >
> > > Potential points which fixes the hang
> > >
> > > 1) Using no scheduler (none) on the client side RNBD block devices
> > > results in no hang.
> > >
> > > 2) In the fio config, changing the line "iodepth_batch_complete=128"
> > > to the following fixes the hang,
> > > iodepth_batch_complete_min=1
> > > iodepth_batch_complete_max=128
> > > OR,
> > > iodepth_batch_complete=0
> > >
> > > 3) We also tracked down the version from which the hang started. The
> > > hang started with v5.10.50, and the following commit was one which
> > > results in the hang
> > >
> > > commit 512106ae2355813a5eb84e8dc908628d52856890
> > > Author: Ming Lei <ming.lei@xxxxxxxxxx>
> > > Date:   Fri Jun 25 10:02:48 2021 +0800
> > >
> > >     blk-mq: update hctx->dispatch_busy in case of real scheduler
> > >
> > >     [ Upstream commit cb9516be7708a2a18ec0a19fe3a225b5b3bc92c7 ]
> > >
> > >     Commit 6e6fcbc27e77 ("blk-mq: support batching dispatch in case of io")
> > >     starts to support io batching submission by using hctx->dispatch_busy.
> > >
> > >     However, blk_mq_update_dispatch_busy() isn't changed to update
> > > hctx->dispatch_busy
> > >     in that commit, so fix the issue by updating hctx->dispatch_busy in case
> > >     of real scheduler.
> > >
> > >     Reported-by: Jan Kara <jack@xxxxxxx>
> > >     Reviewed-by: Jan Kara <jack@xxxxxxx>
> > >     Fixes: 6e6fcbc27e77 ("blk-mq: support batching dispatch in case of io")
> > >     Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
> > >     Link: https://lore.kernel.org/r/20210625020248.1630497-1-ming.lei@xxxxxxxxxx
> > >     Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
> > >     Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
> > >
> > > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > > index 00d6ed2fe812..a368eb6dc647 100644
> > > --- a/block/blk-mq.c
> > > +++ b/block/blk-mq.c
> > > @@ -1242,9 +1242,6 @@ static void blk_mq_update_dispatch_busy(struct
> > > blk_mq_hw_ctx *hctx, bool busy)
> > >  {
> > >         unsigned int ewma;
> > >
> > > -       if (hctx->queue->elevator)
> > > -               return;
> > > -
> > >         ewma = hctx->dispatch_busy;
> > >
> > >         if (!ewma && !busy)
> > >
> > > We reverted the commit and tested and there is no hang.
> > >
> > > 4) Lastly, we tested newer version like 5.13, and there is NO hang in
> > > that also. Hence, probably some other change fixed it.
> >
> > Can you observe the issue on v5.10? Maybe there is one pre-patch of commit cb9516be7708
> > ("blk-mq: update hctx->dispatch_busy in case of real scheduler merged")
> > which is missed to 5.10.y.
> 
> If you mean v5.10.0, then no, I see no hang there. As I mentioned
> before, there is no hang till v5.10.49.
> 
> >
> > And not remember that there is fix for commit cb9516be7708 in mainline.
> >
> > commit cb9516be7708 is merged to v5.14, instead of v5.13, did you test v5.14 or v5.15?
> >
> > BTW, commit cb9516be7708 should just affect performance, not supposed to cause
> > hang.
> 
> True. It does look like that from the small code change.

->dispatch_busy just affects the batching size for dequeuing request
from scheduler queue, see the following code in
__blk_mq_do_dispatch_sched():

        if (hctx->dispatch_busy)
                max_dispatch = 1;
        else
                max_dispatch = hctx->queue->nr_requests;


Also see blk_mq_do_dispatch_sched():

static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
{
        int ret;

        do {
                ret = __blk_mq_do_dispatch_sched(hctx);
        } while (ret == 1);

        return ret;
}

If any request can be dispatched to driver, blk-mq will continue to
dispatch until -EAGAIN or 0 is returned from __blk_mq_do_dispatch_sched().

In case of -EAGAIN, blk-mq will try again to dispatch request and run
queue asynchronously if another -EAGAIN is returend.

In case of 0 returned, blk_mq_dispatch_rq_list() can't make progress,
the request will be moved to hctx->dispatch, and blk-mq covers the
re-run uueue until all requests in hctx->dispatch are dispatched, then
still dispatch request from scheduler queue.

So the current code has provided forward-progress, and I don't see how
patch 'blk-mq: update hctx->dispatch_busy in case of real scheduler' can
cause this issue.

Also not see any request in hctx->dispatch from debugfs log.

> 
> I wasn't able to test in v5.14 and v5.15 because we are seeing some
> other errors in those versions, most probably related to the
> rdma-core/rxe driver.

I'd rather see test result on upstream kernel of v5.14 or v5.15 or recent
v5.15-rc.



Thanks,
Ming




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux