Re: [PATCH v2 11/11] interconnect: Add devfreq support

Saravana Kannan <saravanak@xxxxxxxxxx> · Mon, 17 Jun 2019 14:18:23 -0700

On Mon, Jun 17, 2019 at 8:44 AM Georgi Djakov <georgi.djakov@xxxxxxxxxx> wrote:
>
> Hi Saravana,
>
> On 6/14/19 07:17, Saravana Kannan wrote:
> > Add a icc_create_devfreq() and icc_remove_devfreq() to create and remove
> > devfreq devices for interconnect paths. A driver can create/remove devfreq
> > devices for the interconnects needed for its device by calling these APIs.
> > This would allow various devfreq governors to work with interconnect paths
> > and the device driver itself doesn't have to actively manage the bandwidth
> > votes for the interconnects.
>
> Thanks for the patches, but creating devfreq devices for each interconnect path
> seems odd to me - at least for consumers that already use a governor.

Each governor instance always handles one "frequency" (more like
performance) domain at a time. So if a consumer is already using a
governor to scale the hardware block, then using another governor to
scale the interconnect performance points is the right way to go about
it. In fact, that's exactly what devfreq passive governor's
documentation even says it's meant for. That's also what cpufreq does
for each cluster/CPU frequency domain too.

> So for DDR
> scaling for example, are you suggesting that we add a devfreq device from the
> cpufreq driver in order to scale the interconnect between CPU<->DDR?

Yes in general. Although, CPUs are a special case because CPUs don't
go through devfreq. So passive governor as it stands today won't work.
CPU<->DDR scaling might need a separate governor (unlikely) or some
changes to the passive governor that I'm happy to work on once we
settle this for general devices like GPU, etc. But the DT format for
CPUs will be identical to GPUs or any other device.

> Also if the
> GPU is already using devfreq, should we add a devfreq per each interconnect
> path? What would be the benefit in this case - using different governors for
> bandwidth scaling maybe?

When saying "separate/different governors" in this email, I mean both
different instance of the same governor logic with different tunables
AND actually different algorithms/governor logic entirely.

The heuristics to use for each interconnect path might be (more like,
will be) different based on hardware characteristics (Eg: what voltage
domains the interconnect is sitting on) and what interconnect
information is available (Eg: Just busy time vs bandwidth count vs no
information etc) -- so having separate governors for each interconnect
path makes a lot of sense. It also allows userspace to control the
policy for scaling each of those paths based on product use cases.

For example, when the GPU is just doing simple UI rendering, userspace
can use the max_freq sysfs file for the devfreq device to disallow high
bandwidth OPPs on the GPU<->DDR path, but those higher OPPs could be
allowed by userspace when the GPU is used for games. Having devfreq
device for each interconnect path also make it easy to debug
performance issues -- you can independently change the votes for each
path to figure out what is causing the bottleneck, etc.

Adding a devfreq device for interconnect voting with a few lines gives
all these features "for free".

This doesn't mean all users of interconnect framework NEED to use
devfreq for interconnect. They might do it simply based on
calculations based on the use case (Eg: display driver from display
resolution). But if they are trying to use any kind of
algorithm/heuristics, writing it as a devfreq governor should be
encouraged.

Also want to point out that BW OPPs also work for drivers that don't
use devfreq at all. The interconnect-opp-table just lists the
meaningful OPP leveld for the path and the device driver can pick one
entry from the table based on the use case.

Thanks,
Saravana