Re: interconnects on Tegra

Georgi Djakov <georgi.djakov@xxxxxxxxxx> · Fri, 16 Nov 2018 13:06:46 +0200

Hi,

Adding linux-pm list.

On 11/15/18 18:47, Thierry Reding wrote:
On Thu, Nov 15, 2018 at 01:17:58AM +0000, Krishna Sitaraman wrote:
Thierry, thanks for looping us in.

     Reading this thread, I believe Peter has given a brief overview of
     how we manage memory bus clock along with latency & priority
     information to the memory subsystem for isochronous clients.  I
     want to add to this:
-	The total amount of isochronous bandwidth achievable is limited
and we have to arbitrate the available iso bandwidth at runtime among
the clients.  So our tegra framework provides a mechanism for clients
to check and then lock a particular iso bandwidth before attempting to
switch to the desired mode which uses it.
-	The tegra framework also provides mechanism for the isochronous
clients to set a latency and/or priority information to the memory
arbitration hardware.

    The interconnect framework seems to be a good fit and we might be
    able to make use of it.  However there are certain additional
    functionality we want to request or suggest that can help in using
    the interconnects.

Thanks for the comments!

Listing them out here:

1.	Isochronous bandwidth manager needs to provide feedback to the
clients (consumers) to know if a particular iso bandwidth request is
possible or not before clients can make a definite switch.  Example
Display wanting to know if a mode is possible before switch to the new
configuration.  The interconnect framework needs a method for provider
to give a feedback to the requesting consumer.  A check or is_possible
request before actual set request.

How would we handle races with other bandwidth requests? Let's say the
display driver requests a check for a certain bandwidth, then gets back
a positive reply, after which it would go make the actual request, but
at that point somebody else might have already requested additional
bandwidth, in turn making the actual request fail.

Would it perhaps be a good idea to just make the actual request fail if
there's not enough bandwidth and allocate the requested bandwidth if it
is available? That way after the consumer gets back a positive reply to
the request it knows that it can use it. On the other hand if the reply
was negative it can downgrade to some other mode and retry.

Any solution that involves two separate steps would probably require
some way of locking the provider until the actual request has been
satisfied, right?

Agree with Thierry.

So icc_set() propagates any non-zero return value back to the caller. 
The interconnect platform driver could return -EAGAIN or -EBUSY or any 
other value (remaining bandwidth?) to indicate that the bandwidth is not 
available. Then this could be handled by the consumer driver (e.g 
display driver will retry with lower resolution). Does this sound like 
an option for you?

Also, if we do move towards something like this, perhaps a better name
for the function to file these requests would be icc_request() or
something like that. This would make it somewhat more obvious that the
function is actually merely a request, rather than a hard "apply"
operation.

If we decide to extend the API with icc_request(), i imagine that the 
first step would be to call icc_request() on a path which will try to 
reserve the requested bandwidth (without actual request) until the 
subsequent icc_set() is called. Something like this is doable, but will 
require some more thought.

Would it also perhaps be useful to have the function return the maximum
amount of bandwidth available on failure? That could provide useful
hints to the consumers about what modes to downgrade and which will fail
straight away anyway. Then again, this would entail a two step approach,
so the actual request for the lower bandwidth might still fail if some
other consumer requested additional bandwidth since the failure
As mentioned above, maybe the value returned by icc_set() could be used 
as hint?

2.	How is the peak_bw actually defined and what is the intended
usage?  Need clarity on this.
a.	A existing implementation from Qualcomm, seems to do a max of
all peak_bw  on their code.  Does this mean that all consumers would
not be using their peak_bw at the same time?  Why is it not a sum of
all peak_bw.  So this is not clear to us.

The sum of all peak_bw will tend to significantly over-estimate the 
actual required bandwidth for the interconnect and also power 
requirement. There is an implicit assumption that either the peaks won't 
always line up, or if they do the use case is tolerant enough to handle 
it with the help of QoS hardware.

The aggregation is platform specific, so each platform can specify how 
the bandwidth is aggregated and adjust accordingly.

3.	In addition to peak_bw and avg_bw  can interconnects support a
floor request on a clock? We need a floor request for clients which
are affected by latency and not that much by bandwidth.   For example
cpu is more latency sensitive than bandwidth in some cases.   So cpu
clients set a emc floor based on its current cpu frequency to satisfy
a minimum latency need.

What would be the difference between a floor request and a request for
peak bandwidth corresponding to that floor frequency? Couldn't the CPU
just register a regular bandwidth request to achieve the same goal? I
mean registering a peak bandwidth that meets the minimum requirements
for the needed latency would ensure that bandwidth never goes below
that value, so it would basically be a floor request. Perhaps this could
also be a special case where peak bandwidth and average bandwidth are
actually the same?

So this depends on the aggregation. In the case above, if we max all the 
peak_bw - the peak_bw operates also as a floor request.

4.	Request to have tracing as a debug option. On every icc_set()
call print the path and aggregated avg bw value.

This could presumably be done with ftrace and would be, in my opinion, a
good addition to this framework.

This is a good idea!

a.	We also want to know what the request from every client is, at a
given instant, so that we can add testcase to ensure the emc
calculation code is doing the right thing.   Automated tests can make
use of this.

Would a debugfs interface be useful for this? I could imagine that
something similar to the common clock framework (per-client file with
current requests and an additional bandwidth_summary file with a list
of all requests and a total perhaps) would work very well for this.

There is already an interconnect_summary file, which lists all the 
requests from drivers with requested/aggregated values.

5.	To support latency & priority programming some chips need to
pass additional parameters apart from bandwidth or latency
information.  Will the interconnects framework support mechanism to
pass a private struct (downstream defined struct) for the set
operation?  The private struct can be part of the icc_node, and
programmed by the consumer.   This will also help to support any
future deviations.

That's slightly suboptimal because this is a generic framework, so one
of the goals is that consumer drivers are agnostic of the specific
provider implementation. So adding provider-specific requests doesn't
make much sense because generic consumers couldn't use them anyway. I
suppose this could be made to work for cases where we know that the
provider and consumer are always tightly coupled, but it could make
things fairly complicated.

We should avoid using private data structs.

All of that said, I would consider latency and priority generic enough
concepts to fit into this framework natively. Latency and priority can
be just another type of request that can be registered. Depending on
how this develops it might be better to move away from passing request
parameters individually and instead move to something like a structure
based approach:

	struct icc_request {
		struct {
			unsigned long average;
			unsigned long peak;
		} bandwidth;

		struct {
			unsigned long minimum;
			unsigned long maximum;
		} latency;

		unsigned long priority;
	};

Yes, i have thought about this and is planned as a next step. It's 
really great that you guys are providing feedback and helping to move 
forward.

6.	When will the latency part of the interconnects framework be
implemented?  What features is it adding?

Since you and Sanjay are most familiar with how this works and what
exactly the requirements are, perhaps this is something that you could
prototype as part of an attempt to implement a Tegra provider for the
interconnect framework? That way you can determine what exactly gets
added based on our requirements and Georgi can provide feedback on the
proposed solution and how it fits into the bigger picture.

Sure! Thanks!

BR,
Georgi