Patchv2 make the adjustment work as a CFS's over-preempted guard which only take effect for READ On Tue, Feb 20, 2024 at 7:46 PM zhaoyang.huang <zhaoyang.huang@xxxxxxxxxx> wrote: > > From: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx> > > According to current policy, CFS's may suffer involuntary IO-latency by > being preempted by RT/DL tasks or IRQ since they possess the privilege for > both of CPU and IO scheduler. This commit introduce an approximate and > light method to decrease these affection by adjusting the expire time > via the CFS's proportion among the whole cpu active time. > The average utilization of cpu's run queue could reflect the historical > active proportion of different types of task that can be proved valid for > this goal from belowing three perspective, > > 1. All types of sched class's load(util) are tracked and calculated in the > same way(using a geometric series which known as PELT) > 2. Keep the legacy policy by NOT adjusting rq's position in fifo_list > but only make changes over expire_time. > 3. The fixed expire time(hundreds of ms) is in the same range of cpu > avg_load's account series(the utilization will be decayed to 0.5 in 32ms) > > TaskA > sched in > | > | > | > submit_bio > | > | > | > fifo_time = jiffies + expire > (insert_request) > > TaskB > sched in > | > | > vfs_xxx > | > |preempted by RT,DL,IRQ > |\ > | This period time is unfair to TaskB's IO request, should be adjust > |/ > | > submit_bio > | > | > | > fifo_time = jiffies + expire * CFS_PROPORTION(rq) > (insert_request) > > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx> > --- > change of v2: introduce direction and threshold to make the hack working > as a guard for CFS's over-preempted. > --- > --- > block/mq-deadline.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/block/mq-deadline.c b/block/mq-deadline.c > index f958e79277b8..b5aa544d69a3 100644 > --- a/block/mq-deadline.c > +++ b/block/mq-deadline.c > @@ -54,6 +54,7 @@ enum dd_prio { > > enum { DD_PRIO_COUNT = 3 }; > > +#define CFS_PROP_THRESHOLD 60 > /* > * I/O statistics per I/O priority. It is fine if these counters overflow. > * What matters is that these counters are at least as wide as > @@ -802,6 +803,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, > u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio); > struct dd_per_prio *per_prio; > enum dd_prio prio; > + int fifo_expire; > > lockdep_assert_held(&dd->lock); > > @@ -839,8 +841,20 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, > > /* > * set expire time and add to fifo list > + * The expire time is adjusted when current CFS task is > + * over-preempted by RT/DL/IRQ which is calculated by the > + * proportion of CFS's activation among whole cpu time during > + * last several dozen's ms.Whearas, this would NOT affect the > + * rq's position in fifo_list but only take effect when this > + * rq is checked for its expire time when at head. > */ > - rq->fifo_time = jiffies + dd->fifo_expire[data_dir]; > + fifo_expire = dd->fifo_expire[data_dir]; > + if (data_dir == DD_READ && > + (cfs_prop_by_util(current, 100) < CFS_PROP_THRESHOLD)) > + fifo_expire = cfs_prop_by_util(current, dd->fifo_expire[data_dir]); > + > + rq->fifo_time = jiffies + fifo_expire; > + > insert_before = &per_prio->fifo_list[data_dir]; > #ifdef CONFIG_BLK_DEV_ZONED > /* > -- > 2.25.1 >