On Thu, Mar 07, 2019 at 05:10:53PM -0500, Josef Bacik wrote: > On Thu, Mar 07, 2019 at 07:08:32PM +0100, Andrea Righi wrote: > > Prevent priority inversion problem when a high-priority blkcg issues a > > sync() and it is forced to wait the completion of all the writeback I/O > > generated by any other low-priority blkcg, causing massive latencies to > > processes that shouldn't be I/O-throttled at all. > > > > The idea is to save a list of blkcg's that are waiting for writeback: > > every time a sync() is executed the current blkcg is added to the list. > > > > Then, when I/O is throttled, if there's a blkcg waiting for writeback > > different than the current blkcg, no throttling is applied (we can > > probably refine this logic later, i.e., a better policy could be to > > adjust the throttling I/O rate using the blkcg with the highest speed > > from the list of waiters - priority inheritance, kinda). > > > > Signed-off-by: Andrea Righi <andrea.righi@xxxxxxxxxxxxx> > > --- > > block/blk-cgroup.c | 131 +++++++++++++++++++++++++++++++ > > block/blk-throttle.c | 11 ++- > > fs/fs-writeback.c | 5 ++ > > fs/sync.c | 8 +- > > include/linux/backing-dev-defs.h | 2 + > > include/linux/blk-cgroup.h | 23 ++++++ > > mm/backing-dev.c | 2 + > > 7 files changed, 178 insertions(+), 4 deletions(-) > > > > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c > > index 2bed5725aa03..4305e78d1bb2 100644 > > --- a/block/blk-cgroup.c > > +++ b/block/blk-cgroup.c > > @@ -1351,6 +1351,137 @@ struct cgroup_subsys io_cgrp_subsys = { > > }; > > EXPORT_SYMBOL_GPL(io_cgrp_subsys); > > > > +#ifdef CONFIG_CGROUP_WRITEBACK > > +struct blkcg_wb_sleeper { > > + struct backing_dev_info *bdi; > > + struct blkcg *blkcg; > > + refcount_t refcnt; > > + struct list_head node; > > +}; > > + > > +static DEFINE_SPINLOCK(blkcg_wb_sleeper_lock); > > +static LIST_HEAD(blkcg_wb_sleeper_list); > > + > > +static struct blkcg_wb_sleeper * > > +blkcg_wb_sleeper_find(struct blkcg *blkcg, struct backing_dev_info *bdi) > > +{ > > + struct blkcg_wb_sleeper *bws; > > + > > + list_for_each_entry(bws, &blkcg_wb_sleeper_list, node) > > + if (bws->blkcg == blkcg && bws->bdi == bdi) > > + return bws; > > + return NULL; > > +} > > + > > +static void blkcg_wb_sleeper_add(struct blkcg_wb_sleeper *bws) > > +{ > > + list_add(&bws->node, &blkcg_wb_sleeper_list); > > +} > > + > > +static void blkcg_wb_sleeper_del(struct blkcg_wb_sleeper *bws) > > +{ > > + list_del_init(&bws->node); > > +} > > + > > +/** > > + * blkcg_wb_waiters_on_bdi - check for writeback waiters on a block device > > + * @blkcg: current blkcg cgroup > > + * @bdi: block device to check > > + * > > + * Return true if any other blkcg different than the current one is waiting for > > + * writeback on the target block device, false otherwise. > > + */ > > +bool blkcg_wb_waiters_on_bdi(struct blkcg *blkcg, struct backing_dev_info *bdi) > > +{ > > + struct blkcg_wb_sleeper *bws; > > + bool ret = false; > > + > > + spin_lock(&blkcg_wb_sleeper_lock); > > + list_for_each_entry(bws, &blkcg_wb_sleeper_list, node) > > + if (bws->bdi == bdi && bws->blkcg != blkcg) { > > + ret = true; > > + break; > > + } > > + spin_unlock(&blkcg_wb_sleeper_lock); > > + > > + return ret; > > +} > > No global lock please, add something to the bdi I think? Also have a fast path > of OK, I'll add a list per-bdi and a lock as well. > > if (list_empty(blkcg_wb_sleeper_list)) > return false; OK. > > we don't need to be super accurate here. Thanks, > > Josef Thanks, -Andrea