On Thu, Jan 31, 2019 at 5:50 AM Jens Axboe <axboe@xxxxxxxxx> wrote: > > On 1/31/19 2:26 AM, Jan Kara wrote: > > Hi! > > > > On Thu 31-01-19 10:03:34, Xiaoguang Wang wrote: > >>> Currently in blk_throtl_bio(), if one bio exceeds its throtl_grp's bps > >>> or iops limit, this bio will be queued throtl_grp's throtl_service_queue, > >>> then obviously mm subsys will submit more pages, even underlying device > >>> can not handle these io requests, also this will make large amount of pages > >>> entering writeback prematurely, later if some process writes some of these > >>> pages, it will wait for long time. > >>> > >>> I have done some tests: one process does buffered writes on a 1GB file, > >>> and make this process's blkcg max bps limit be 10MB/s, I observe this: > >>> #cat /proc/meminfo | grep -i back > >>> Writeback: 900024 kB > >>> WritebackTmp: 0 kB > >>> > >>> I think this Writeback value is just too big, indeed many bios have been > >>> queued in throtl_grp's throtl_service_queue, if one process try to write > >>> the last bio's page in this queue, it will call wait_on_page_writeback(page), > >>> which must wait the previous bios to finish and will take long time, we > >>> have also see 120s hung task warning in our server. > >>> > >>> To fix this issue, we can simply limit throtl_service_queue's max queued > >>> bios, currently we limit it to throtl_grp's bps_limit or iops limit, if it > >>> still exteeds, we just sleep for a while. > >> Ping :) > >> > >> The fix method in this patch is not good, I had written a new patch that > >> uses wait queue, but do you think this is a blk-throttle design issue and > >> needs fixing? thanks. > > > > Well, essentially this is a priority inversion issue where low-priority > > process submits writes and higher priority process blocks on those, isn't > > it? I think the blk-wbt throttling was meant to address these issues by > > throttling the process already when submitting bios (i.e. something similar > > to what you propose in your patch). I'll defer to Jens as a maintainer > > whether he wants to redirect users to blk-wbt or whether improving > > blk-throttle to avoid similar issues is desirable. Jens? > > I think that blk-throttle usage should be phased out and we can > hopefully remove it at some point. I also don't think that there's a > large use base of it, which is good, but does seem active on the Alibaba > front. > The concept of blk-throttle is attractive, but it doesn't make too much sense to applications' qos in practice because both latency and throughput are important and blk-throttle are a bit coarse-grained on this. So we're not going to stick with it forever. Now that blk-iolatency has been there, in theory it should help a bit. However, another thing that what we're expecting is a proportional setting on IO resource shares. "Outstanding IO" or "queue depth" based solution sounds like a good direction, but unlike blk-sq where there's only one queue and queue depth is fixed, with blk-mq, the problem is that it's difficult to assign an accurate depth for a weight. thanks, liubo