Re: [PATCH] blk-throttle: limit bios to fix amount of pages entering writeback prematurely

Liu Bo <obuil.liubo@xxxxxxxxx> · Thu, 31 Jan 2019 13:20:49 -0800

On Thu, Jan 31, 2019 at 5:50 AM Jens Axboe <axboe@xxxxxxxxx> wrote:
>
> On 1/31/19 2:26 AM, Jan Kara wrote:
> > Hi!
> >
> > On Thu 31-01-19 10:03:34, Xiaoguang Wang wrote:
> >>> Currently in blk_throtl_bio(), if one bio exceeds its throtl_grp's bps
> >>> or iops limit, this bio will be queued throtl_grp's throtl_service_queue,
> >>> then obviously mm subsys will submit more pages, even underlying device
> >>> can not handle these io requests, also this will make large amount of pages
> >>> entering writeback prematurely, later if some process writes some of these
> >>> pages, it will wait for long time.
> >>>
> >>> I have done some tests: one process does buffered writes on a 1GB file,
> >>> and make this process's blkcg max bps limit be 10MB/s, I observe this:
> >>>     #cat /proc/meminfo  | grep -i back
> >>>     Writeback:        900024 kB
> >>>     WritebackTmp:          0 kB
> >>>
> >>> I think this Writeback value is just too big, indeed many bios have been
> >>> queued in throtl_grp's throtl_service_queue, if one process try to write
> >>> the last bio's page in this queue, it will call wait_on_page_writeback(page),
> >>> which must wait the previous bios to finish and will take long time, we
> >>> have also see 120s hung task warning in our server.
> >>>
> >>> To fix this issue, we can simply limit throtl_service_queue's max queued
> >>> bios, currently we limit it to throtl_grp's bps_limit or iops limit, if it
> >>> still exteeds, we just sleep for a while.
> >> Ping :)
> >>
> >> The fix method in this patch is not good, I had written a new patch that
> >> uses wait queue, but do you think this is a blk-throttle design issue and
> >> needs fixing? thanks.
> >
> > Well, essentially this is a priority inversion issue where low-priority
> > process submits writes and higher priority process blocks on those, isn't
> > it? I think the blk-wbt throttling was meant to address these issues by
> > throttling the process already when submitting bios (i.e. something similar
> > to what you propose in your patch). I'll defer to Jens as a maintainer
> > whether he wants to redirect users to blk-wbt or whether improving
> > blk-throttle to avoid similar issues is desirable. Jens?
>
> I think that blk-throttle usage should be phased out and we can
> hopefully remove it at some point. I also don't think that there's a
> large use base of it, which is good, but does seem active on the Alibaba
> front.
>

The concept of blk-throttle is attractive, but it doesn't make too
much sense to applications' qos in practice because both latency and
throughput are important and blk-throttle are a bit coarse-grained on
this.
So we're not going to stick with it forever.

Now that blk-iolatency has been there, in theory it should help a bit.
However, another thing that what we're expecting is a proportional
setting on IO resource shares.

"Outstanding IO" or "queue depth" based solution sounds like a good
direction, but unlike blk-sq where there's only one queue and queue
depth is fixed, with blk-mq, the problem is that it's difficult to
assign an accurate depth for a weight.

thanks,
liubo