Re: [PATCH] blk-iolatency: fix IO hang due to negative inflight counter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 18, 2019 at 5:51 PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>
> On 1/18/19 6:39 PM, Liu Bo wrote:
> > On Fri, Jan 18, 2019 at 8:43 AM Josef Bacik <josef@xxxxxxxxxxxxxx> wrote:
> >>
> >> On Fri, Jan 18, 2019 at 09:28:06AM -0700, Jens Axboe wrote:
> >>> On 1/18/19 9:21 AM, Josef Bacik wrote:
> >>>> On Fri, Jan 18, 2019 at 05:58:18AM -0700, Jens Axboe wrote:
> >>>>> On 1/14/19 12:21 PM, Liu Bo wrote:
> >>>>>> Our test reported the following stack, and vmcore showed that
> >>>>>> ->inflight counter is -1.
> >>>>>>
> >>>>>> [ffffc9003fcc38d0] __schedule at ffffffff8173d95d
> >>>>>> [ffffc9003fcc3958] schedule at ffffffff8173de26
> >>>>>> [ffffc9003fcc3970] io_schedule at ffffffff810bb6b6
> >>>>>> [ffffc9003fcc3988] blkcg_iolatency_throttle at ffffffff813911cb
> >>>>>> [ffffc9003fcc3a20] rq_qos_throttle at ffffffff813847f3
> >>>>>> [ffffc9003fcc3a48] blk_mq_make_request at ffffffff8137468a
> >>>>>> [ffffc9003fcc3b08] generic_make_request at ffffffff81368b49
> >>>>>> [ffffc9003fcc3b68] submit_bio at ffffffff81368d7d
> >>>>>> [ffffc9003fcc3bb8] ext4_io_submit at ffffffffa031be00 [ext4]
> >>>>>> [ffffc9003fcc3c00] ext4_writepages at ffffffffa03163de [ext4]
> >>>>>> [ffffc9003fcc3d68] do_writepages at ffffffff811c49ae
> >>>>>> [ffffc9003fcc3d78] __filemap_fdatawrite_range at ffffffff811b6188
> >>>>>> [ffffc9003fcc3e30] filemap_write_and_wait_range at ffffffff811b6301
> >>>>>> [ffffc9003fcc3e60] ext4_sync_file at ffffffffa030cee8 [ext4]
> >>>>>> [ffffc9003fcc3ea8] vfs_fsync_range at ffffffff8128594b
> >>>>>> [ffffc9003fcc3ee8] do_fsync at ffffffff81285abd
> >>>>>> [ffffc9003fcc3f18] sys_fsync at ffffffff81285d50
> >>>>>> [ffffc9003fcc3f28] do_syscall_64 at ffffffff81003c04
> >>>>>> [ffffc9003fcc3f50] entry_SYSCALL_64_after_swapgs at ffffffff81742b8e
> >>>>>>
> >>>>>> The ->inflight counter may be negative (-1) if
> >>>>>>
> >>>>>> 0) blk-throttle had been enabled when the IO was issued, so its bio
> >>>>>> has a associated blkg,
> >>>>>>
> >>>>>> 1) blk-iolatency was disabled when the IO was issued, so iolatency_grp
> >>>>>> in this blkg was not available by then,
> >>>>>>
> >>>>>> 2) blk-iolatency was enabled before this IO reached its endio, so that
> >>>>>> iolatency_grp became available when the IO did the endio.
> >>>>>>
> >>>>>> 3) the ->inflight counter is decreased from 0 to -1.
> >>>>>>
> >>>>>> This uses atomic_dec_is_positive() instead to avoid the negative
> >>>>>> inflight counter.
> >>>>>
> >>>>> The problem with that is that it'll hide a lot of other issues, too.
> >>>>> Any way we can either track if this rqw is in flight, and only dec
> >>>>> if it is, or quiesce when enabling?
> >>>>>
> >>>>
> >>>> I worried about this too, but really the side-effect of allowing more through
> >>>> because of mis-counting means we just let more IO through.  I think maybe we add
> >>>> a debug option that we can turn on to see if we're messing up accounting, but in
> >>>> general I don't see a problem with this approach.
> >>>
> >>> The problem is that a problem in accounting elsewhere (like missing increment)
> >>> will now go unnoticed, which could completely screw it up. An occasional
> >>> blip like the one described is totally fine, but that's not guaranteed to be
> >>> the case.
> >>>
> >>
> >> Yeah I agree it's kind of shitty.
> >>
> >>>> The problem we're running into here is there's not really a good way to tag a
> >>>> bio as "seen by io.latency."  We just have to assume if we're on and there's a
> >>>> bi_blkg associated that we saw it at submit time.  We can't just add a flag for
> >>>> every io controller that starts tracking inflight io's, so for now I think this
> >>>> is a reasonable solution.  Thanks,
> >>>
> >>> Can we quiesce instead when enabling/disabling?
> >>
> >> Actually that's not a bad idea, you want to look into that Liu?  That way we can
> >> make sure everything is correct always.  Thanks,
> >>
> >
> > Just to confirm, does "quiesce when enabling/disabling" mean "waiting
> > for inflight IOs to complete when enabling/disabling iolatency"?
>

So it might not be an easy thing to quiesce queue because...

iolatency_set_limit()
   blkg_conf_prep() // return with rcu and queue's lock

neither freeze_queue nor quiesce_queue is allowed in rcu reader side context.

If we do it outside of blkg_conf_prep/finish(), I feel like the
overhead is more than it worths.

Any thoughts?

thanks,
liubo

> Precisely.
>
> --
> Jens Axboe
>



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux