Re: interconnects on Tegra

Thierry Reding <thierry.reding@xxxxxxxxx> · Fri, 16 Nov 2018 12:53:41 +0100

On Fri, Nov 16, 2018 at 01:06:46PM +0200, Georgi Djakov wrote:
> Hi,
> 
> Adding linux-pm list.
> 
> On 11/15/18 18:47, Thierry Reding wrote:
> > On Thu, Nov 15, 2018 at 01:17:58AM +0000, Krishna Sitaraman wrote:
> > > Thierry, thanks for looping us in.
> > >      Reading this thread, I believe Peter has given a brief overview of
> > >      how we manage memory bus clock along with latency & priority
> > >      information to the memory subsystem for isochronous clients.  I
> > >      want to add to this:
> > > -	The total amount of isochronous bandwidth achievable is limited
> > > and we have to arbitrate the available iso bandwidth at runtime among
> > > the clients.  So our tegra framework provides a mechanism for clients
> > > to check and then lock a particular iso bandwidth before attempting to
> > > switch to the desired mode which uses it.
> > > -	The tegra framework also provides mechanism for the isochronous
> > > clients to set a latency and/or priority information to the memory
> > > arbitration hardware.
> > > 
> > >     The interconnect framework seems to be a good fit and we might be
> > >     able to make use of it.  However there are certain additional
> > >     functionality we want to request or suggest that can help in using
> > >     the interconnects.
> 
> Thanks for the comments!
> 
> > > 
> > > Listing them out here:
> > > 
> > > 1.	Isochronous bandwidth manager needs to provide feedback to the
> > > clients (consumers) to know if a particular iso bandwidth request is
> > > possible or not before clients can make a definite switch.  Example
> > > Display wanting to know if a mode is possible before switch to the new
> > > configuration.  The interconnect framework needs a method for provider
> > > to give a feedback to the requesting consumer.  A check or is_possible
> > > request before actual set request.
> > 
> > How would we handle races with other bandwidth requests? Let's say the
> > display driver requests a check for a certain bandwidth, then gets back
> > a positive reply, after which it would go make the actual request, but
> > at that point somebody else might have already requested additional
> > bandwidth, in turn making the actual request fail.
> > 
> > Would it perhaps be a good idea to just make the actual request fail if
> > there's not enough bandwidth and allocate the requested bandwidth if it
> > is available? That way after the consumer gets back a positive reply to
> > the request it knows that it can use it. On the other hand if the reply
> > was negative it can downgrade to some other mode and retry.
> > 
> > Any solution that involves two separate steps would probably require
> > some way of locking the provider until the actual request has been
> > satisfied, right?
> 
> Agree with Thierry.
> 
> So icc_set() propagates any non-zero return value back to the caller. The
> interconnect platform driver could return -EAGAIN or -EBUSY or any other
> value (remaining bandwidth?) to indicate that the bandwidth is not
> available. Then this could be handled by the consumer driver (e.g display
> driver will retry with lower resolution). Does this sound like an option for
> you?
> 
> > Also, if we do move towards something like this, perhaps a better name
> > for the function to file these requests would be icc_request() or
> > something like that. This would make it somewhat more obvious that the
> > function is actually merely a request, rather than a hard "apply"
> > operation.
> 
> If we decide to extend the API with icc_request(), i imagine that the first
> step would be to call icc_request() on a path which will try to reserve the
> requested bandwidth (without actual request) until the subsequent icc_set()
> is called. Something like this is doable, but will require some more
> thought.

I think I may have been unclear. My suggestion was to rename icc_set()
to icc_request() given that it is a request that can fail. Admittedly
this is splitting hairs a little, and I don't feel strongly about it.

Having a single call to request bandwidth/latency/priority seems like
the easiest API. It ensures that when icc_set() return 0 (success) that
the bandwidth will already have been allocated for the request so the
consumer can continue assuming that it will be able to get enough
bandwidth. If for any reason it ends up not needing the bandwidth after
all, or needing less, it can simply call icc_set() again with different
parameters.

> > Would it also perhaps be useful to have the function return the maximum
> > amount of bandwidth available on failure? That could provide useful
> > hints to the consumers about what modes to downgrade and which will fail
> > straight away anyway. Then again, this would entail a two step approach,
> > so the actual request for the lower bandwidth might still fail if some
> > other consumer requested additional bandwidth since the failure
> As mentioned above, maybe the value returned by icc_set() could be used as
> hint?
> 
> > > 2.	How is the peak_bw actually defined and what is the intended
> > > usage?  Need clarity on this.
> > > a.	A existing implementation from Qualcomm, seems to do a max of
> > > all peak_bw  on their code.  Does this mean that all consumers would
> > > not be using their peak_bw at the same time?  Why is it not a sum of
> > > all peak_bw.  So this is not clear to us.
> 
> The sum of all peak_bw will tend to significantly over-estimate the actual
> required bandwidth for the interconnect and also power requirement. There is
> an implicit assumption that either the peaks won't always line up, or if
> they do the use case is tolerant enough to handle it with the help of QoS
> hardware.
> 
> The aggregation is platform specific, so each platform can specify how the
> bandwidth is aggregated and adjust accordingly.

That sounds reasonable to me. I would expect that clients which have a
very hard requirement (such as display) would pass the same bandwidth
for both peak and average values. I would also presume that an
implementation would sum up all average requests to get at the total
requested bandwidth needed and use peak values for perhaps additional
margin or so.

> > > 3.	In addition to peak_bw and avg_bw  can interconnects support a
> > > floor request on a clock? We need a floor request for clients which
> > > are affected by latency and not that much by bandwidth.   For example
> > > cpu is more latency sensitive than bandwidth in some cases.   So cpu
> > > clients set a emc floor based on its current cpu frequency to satisfy
> > > a minimum latency need.
> > 
> > What would be the difference between a floor request and a request for
> > peak bandwidth corresponding to that floor frequency? Couldn't the CPU
> > just register a regular bandwidth request to achieve the same goal? I
> > mean registering a peak bandwidth that meets the minimum requirements
> > for the needed latency would ensure that bandwidth never goes below
> > that value, so it would basically be a floor request. Perhaps this could
> > also be a special case where peak bandwidth and average bandwidth are
> > actually the same?
> 
> So this depends on the aggregation. In the case above, if we max all the
> peak_bw - the peak_bw operates also as a floor request.
> 
> > 
> > > 4.	Request to have tracing as a debug option. On every icc_set()
> > > call print the path and aggregated avg bw value.
> > 
> > This could presumably be done with ftrace and would be, in my opinion, a
> > good addition to this framework.
> 
> This is a good idea!
> 
> > > a.	We also want to know what the request from every client is, at a
> > > given instant, so that we can add testcase to ensure the emc
> > > calculation code is doing the right thing.   Automated tests can make
> > > use of this.
> > 
> > Would a debugfs interface be useful for this? I could imagine that
> > something similar to the common clock framework (per-client file with
> > current requests and an additional bandwidth_summary file with a list
> > of all requests and a total perhaps) would work very well for this.
> 
> There is already an interconnect_summary file, which lists all the requests
> from drivers with requested/aggregated values.

Looks exactly like what I had in mind. Krishna, Sanjay, does that file
look like something that we could use for testing? Would we need
additional information in it?

Thierry
Attachment:
signature.asc

Description: PGP signature