Re: [PATCH] blk-iolatency: fix IO hang due to negative inflight counter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/23/19 12:52 AM, Liu Bo wrote:
> On Fri, Jan 18, 2019 at 5:51 PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>>
>> On 1/18/19 6:39 PM, Liu Bo wrote:
>>> On Fri, Jan 18, 2019 at 8:43 AM Josef Bacik <josef@xxxxxxxxxxxxxx> wrote:
>>>>
>>>> On Fri, Jan 18, 2019 at 09:28:06AM -0700, Jens Axboe wrote:
>>>>> On 1/18/19 9:21 AM, Josef Bacik wrote:
>>>>>> On Fri, Jan 18, 2019 at 05:58:18AM -0700, Jens Axboe wrote:
>>>>>>> On 1/14/19 12:21 PM, Liu Bo wrote:
>>>>>>>> Our test reported the following stack, and vmcore showed that
>>>>>>>> ->inflight counter is -1.
>>>>>>>>
>>>>>>>> [ffffc9003fcc38d0] __schedule at ffffffff8173d95d
>>>>>>>> [ffffc9003fcc3958] schedule at ffffffff8173de26
>>>>>>>> [ffffc9003fcc3970] io_schedule at ffffffff810bb6b6
>>>>>>>> [ffffc9003fcc3988] blkcg_iolatency_throttle at ffffffff813911cb
>>>>>>>> [ffffc9003fcc3a20] rq_qos_throttle at ffffffff813847f3
>>>>>>>> [ffffc9003fcc3a48] blk_mq_make_request at ffffffff8137468a
>>>>>>>> [ffffc9003fcc3b08] generic_make_request at ffffffff81368b49
>>>>>>>> [ffffc9003fcc3b68] submit_bio at ffffffff81368d7d
>>>>>>>> [ffffc9003fcc3bb8] ext4_io_submit at ffffffffa031be00 [ext4]
>>>>>>>> [ffffc9003fcc3c00] ext4_writepages at ffffffffa03163de [ext4]
>>>>>>>> [ffffc9003fcc3d68] do_writepages at ffffffff811c49ae
>>>>>>>> [ffffc9003fcc3d78] __filemap_fdatawrite_range at ffffffff811b6188
>>>>>>>> [ffffc9003fcc3e30] filemap_write_and_wait_range at ffffffff811b6301
>>>>>>>> [ffffc9003fcc3e60] ext4_sync_file at ffffffffa030cee8 [ext4]
>>>>>>>> [ffffc9003fcc3ea8] vfs_fsync_range at ffffffff8128594b
>>>>>>>> [ffffc9003fcc3ee8] do_fsync at ffffffff81285abd
>>>>>>>> [ffffc9003fcc3f18] sys_fsync at ffffffff81285d50
>>>>>>>> [ffffc9003fcc3f28] do_syscall_64 at ffffffff81003c04
>>>>>>>> [ffffc9003fcc3f50] entry_SYSCALL_64_after_swapgs at ffffffff81742b8e
>>>>>>>>
>>>>>>>> The ->inflight counter may be negative (-1) if
>>>>>>>>
>>>>>>>> 0) blk-throttle had been enabled when the IO was issued, so its bio
>>>>>>>> has a associated blkg,
>>>>>>>>
>>>>>>>> 1) blk-iolatency was disabled when the IO was issued, so iolatency_grp
>>>>>>>> in this blkg was not available by then,
>>>>>>>>
>>>>>>>> 2) blk-iolatency was enabled before this IO reached its endio, so that
>>>>>>>> iolatency_grp became available when the IO did the endio.
>>>>>>>>
>>>>>>>> 3) the ->inflight counter is decreased from 0 to -1.
>>>>>>>>
>>>>>>>> This uses atomic_dec_is_positive() instead to avoid the negative
>>>>>>>> inflight counter.
>>>>>>>
>>>>>>> The problem with that is that it'll hide a lot of other issues, too.
>>>>>>> Any way we can either track if this rqw is in flight, and only dec
>>>>>>> if it is, or quiesce when enabling?
>>>>>>>
>>>>>>
>>>>>> I worried about this too, but really the side-effect of allowing more through
>>>>>> because of mis-counting means we just let more IO through.  I think maybe we add
>>>>>> a debug option that we can turn on to see if we're messing up accounting, but in
>>>>>> general I don't see a problem with this approach.
>>>>>
>>>>> The problem is that a problem in accounting elsewhere (like missing increment)
>>>>> will now go unnoticed, which could completely screw it up. An occasional
>>>>> blip like the one described is totally fine, but that's not guaranteed to be
>>>>> the case.
>>>>>
>>>>
>>>> Yeah I agree it's kind of shitty.
>>>>
>>>>>> The problem we're running into here is there's not really a good way to tag a
>>>>>> bio as "seen by io.latency."  We just have to assume if we're on and there's a
>>>>>> bi_blkg associated that we saw it at submit time.  We can't just add a flag for
>>>>>> every io controller that starts tracking inflight io's, so for now I think this
>>>>>> is a reasonable solution.  Thanks,
>>>>>
>>>>> Can we quiesce instead when enabling/disabling?
>>>>
>>>> Actually that's not a bad idea, you want to look into that Liu?  That way we can
>>>> make sure everything is correct always.  Thanks,
>>>>
>>>
>>> Just to confirm, does "quiesce when enabling/disabling" mean "waiting
>>> for inflight IOs to complete when enabling/disabling iolatency"?
>>
> 
> So it might not be an easy thing to quiesce queue because...
> 
> iolatency_set_limit()
>    blkg_conf_prep() // return with rcu and queue's lock
> 
> neither freeze_queue nor quiesce_queue is allowed in rcu reader side context.
> 
> If we do it outside of blkg_conf_prep/finish(), I feel like the
> overhead is more than it worths.

I think it's hard to say it's more overhead than it's worth, when the
current situation is clearly buggy and the proposed fix has definite
flaws. The only other alternative I can see is to have a flag per
throttling entity, which isn't going to work due to flag shortage.

So I'd very much encourage you to continue going down the path of making
quiesce work. You could potentially do a callback based solution, if the
context within the prep/finish is too limiting?

-- 
Jens Axboe




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux