Re: Regarding patch "block/blk-mq: Don't complete locally if capacities are different"

Qais Yousef <qyousef@xxxxxxxxxxx> · Fri, 9 Aug 2024 01:23:21 +0100

On 08/05/24 11:18, Christian Loehle wrote:

> > My understanding of rq_affinity=1 is to match the perf of requester. Given that
> > the characteristic of HMP system is that power has an equal importance to perf
> > (I think this now has become true for all systems by the way), saying that the
> > match in one direction is better than the other is sort of forcing a policy of
> > perf first which I don't think is a good thing to enforce. We don't have enough
> > info to decide at this level. And our users care about both.
> 
> I would argue rq_affinity=1 matches the perf, so that flag should already bias
> perf in favor of power slightly?

Not on this type of systems. If perf was the only thing important, just use
equally big cpus. Balancing perf and power is important on those systems, and
I don't think we have enough info to decide which decision is best when
capacities are not the same. Matching the perf level the requesting on makes
sense when irq_affinity=1.

> Although the actual effect on power probably isn't that significant, given
> that the (e.g. big) CPU has submitted the IO, is woken up soon, so you could
> almost ignore a potential idle wakeup and the actual CPU time spent in the block
> completion is pretty short of course.
> 
> > If no matching is required, it makes sense to set rq_affinity to 0. When
> > matching is enabled, we need to rely on per-task iowait boost to help the
> > requester to run at a bigger CPU, and naturally the completion will follow when
> > rq_affinity=1. If the requester doesn't need the big perf, but the irq
> > triggered on a bigger core, I struggle to understand why it is good for
> > completion to run on bigger core without the requester also being on a similar
> > bigger core to truly maximize perf.
> 
> So first of all, per-task iowait boosting has nothing to do with it IMO.

It has. If the perf is not good because the requester is running on little
core, the requester need to move up to ensure the overall IO perf is better.

> Plenty of IO workloads build up utilization perfectly fine.

These ones have no problems, no? They should migrate to big core and the
completion will follow them when they move.

> I wouldn't consider the setup: requester little perf, irq+completion big perf
> invalid necessarily, it does decrease IO latency for the application.

I didn't say invalid. But it is not something we can guess automatically when
irq_affinity=1. We don't have enough info to judge. The only info we have the
requester that originated the request is running at different perf level
(whther higher or lower), so we follow it.

> Consider the IO being page faults (maybe even of various applications running
> on little).
> 
> > 
> > By the way, if we assume LLC wasn't the same, then assuming HMP system too, and
> > reverting my patch, then the behavior was to move the completion from bigger
> > core to little core.
> > 
> > So two things to observe:
> > 
> > 1. The patch keeps the behavior when LLC truly is not shared on such systems,
> >    which was in the past.
> > 2. LLC in this case is most likely L2, and the usual trend is that the bigger
> >    the core the bigger L2. So the LLC characteristic is different and could
> >    have impacted performance. No one seem to have cared in the past. I think
> >    capacity gives this notion now implicitly.
>