Re: interconnects on Tegra

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Thierry and Krishna,

On 11/17/18 01:16, Krishna Sitaraman wrote:
My responses inline. (I am using outlook and currently my responses
are marked with [Krishna S].  In near future I will switch to a more
linux friendly indentation.  So please bear with me till then. )

-----Original Message----- From: Thierry Reding
<thierry.reding@xxxxxxxxx> Sent: Friday, November 16, 2018 3:54 AM To: Georgi Djakov <georgi.djakov@xxxxxxxxxx> Cc: Krishna Sitaraman
<ksitaraman@xxxxxxxxxx>; Jonathan Hunter <jonathanh@xxxxxxxxxx>;
Peter De Schrijver <pdeschrijver@xxxxxxxxxx>; Dmitry Osipenko
<digetx@xxxxxxxxx>; linux-tegra <linux-tegra@xxxxxxxxxxxxxxx>; Sanjay
Chandrashekara <sanjayc@xxxxxxxxxx>; Linux PM list
<linux-pm@xxxxxxxxxxxxxxx> Subject: Re: interconnects on Tegra

On Fri, Nov 16, 2018 at 01:06:46PM +0200, Georgi Djakov wrote:
Hi,

Adding linux-pm list.

On 11/15/18 18:47, Thierry Reding wrote:
On Thu, Nov 15, 2018 at 01:17:58AM +0000, Krishna Sitaraman
wrote:
Thierry, thanks for looping us in. Reading this thread, I
believe Peter has given a brief overview of how we manage
memory bus clock along with latency & priority information to
the memory subsystem for isochronous clients.  I want to add to
this: -	The total amount of isochronous bandwidth achievable is
limited and we have to arbitrate the available iso bandwidth at
runtime among the clients.  So our tegra framework provides a
mechanism for clients to check and then lock a particular iso
bandwidth before attempting to switch to the desired mode which
uses it. -	The tegra framework also provides mechanism for the
isochronous clients to set a latency and/or priority
information to the memory arbitration hardware.

The interconnect framework seems to be a good fit and we might
be able to make use of it.  However there are certain
additional functionality we want to request or suggest that can
help in using the interconnects.

Thanks for the comments!


Listing them out here:

1.	Isochronous bandwidth manager needs to provide feedback to
the clients (consumers) to know if a particular iso bandwidth
request is possible or not before clients can make a definite
switch. Example Display wanting to know if a mode is possible
before switch to the new configuration.  The interconnect
framework needs a method for provider to give a feedback to the
requesting consumer.  A check or is_possible request before
actual set request.

How would we handle races with other bandwidth requests? Let's
say the display driver requests a check for a certain bandwidth,
then gets back a positive reply, after which it would go make the
actual request, but at that point somebody else might have
already requested additional bandwidth, in turn making the actual
request fail.

Would it perhaps be a good idea to just make the actual request
fail if there's not enough bandwidth and allocate the requested
bandwidth if it is available? That way after the consumer gets
back a positive reply to the request it knows that it can use it.
On the other hand if the reply was negative it can downgrade to
some other mode and retry.

Any solution that involves two separate steps would probably
require some way of locking the provider until the actual request
has been satisfied, right?

Agree with Thierry.

So icc_set() propagates any non-zero return value back to the
caller. The interconnect platform driver could return -EAGAIN or
-EBUSY or any other value (remaining bandwidth?) to indicate that
the bandwidth is not available. Then this could be handled by the
consumer driver (e.g display driver will retry with lower
resolution). Does this sound like an option for you?


[Krishna S] We do handle the race condition. Internally, in tegra, we
call these functions as reserve() and realize() calls.   During a
reserve() call, the client requests for a certain iso bandwidth and
if it's available the iso bandwidth manager reserves the requested
bandwidth for this client and responds with a positive reply. The
reserved bandwidth is treated as committed and is not allocated to
other clients.  When the client makes the realize() call, that's when
the actual memory parameters are modified according to the previously
reserved value.    The critical part of the reserve() function is
protected by locks in order to avoid race condition.  If the
icc_set() can propagate non-zero return value, that is a good option
to consider.

Ok, good. Thanks for confirming!

Also, if we do move towards something like this, perhaps a
better name for the function to file these requests would be
icc_request() or something like that. This would make it somewhat
more obvious that the function is actually merely a request,
rather than a hard "apply" operation.

If we decide to extend the API with icc_request(), i imagine that
the first step would be to call icc_request() on a path which will
try to reserve the requested bandwidth (without actual request)
until the subsequent icc_set() is called. Something like this is
doable, but will require some more thought.

I think I may have been unclear. My suggestion was to rename
icc_set() to icc_request() given that it is a request that can fail.
Admittedly this is splitting hairs a little, and I don't feel
strongly about it.

Thanks for clarifying, Thierry. Whether a request could fail or not is also platform specific. I prefer icc_set() as it is shorter and seems more natural to have it in addition to icc_get() and icc_put().

Having a single call to request bandwidth/latency/priority seems like
the easiest API. It ensures that when icc_set() return 0 (success)
that the bandwidth will already have been allocated for the request
so the consumer can continue assuming that it will be able to get
enough bandwidth. If for any reason it ends up not needing the
bandwidth after all, or needing less, it can simply call icc_set()
again with different parameters.

[Krishna S] Having a single call to request
bandwidth/latency/priority as suggested above seems like an ideal
method.  We are having internal discussion to see if our legacy
drivers can handle a overhaul to the new method.  Some of them do
other preparations between reserve() and realize() api calls, and
needs a deeper dive, but in concept I believe a single call method
sounds cleaner.  We will get back on this next week.

Alright, this sounds good to me. How do you define the latency - is it a round-trip latency between the endpoints? Do you use just a single required latency value or a range of min/max? What units do you use?

Would it also perhaps be useful to have the function return the maximum amount of bandwidth available on failure? That could
provide useful hints to the consumers about what modes to
downgrade and which will fail straight away anyway. Then again,
this would entail a two step approach, so the actual request for
the lower bandwidth might still fail if some other consumer
requested additional bandwidth since the failure
As mentioned above, maybe the value returned by icc_set() could be used as hint?


[Krishna S] We like the idea of the icc_set() return value to be used
as a hint but the client needs to be careful here.  The available
bandwidth is very dynamic and a hint may become stale before the
client acts on it.  However, the hint can be designed to be used in a
constructive way by each platform.

2.	How is the peak_bw actually defined and what is the
intended usage?  Need clarity on this. a.	A existing
implementation from Qualcomm, seems to do a max of all peak_bw
on their code.  Does this mean that all consumers would not be
using their peak_bw at the same time?  Why is it not a sum of
all peak_bw.  So this is not clear to us.

The sum of all peak_bw will tend to significantly over-estimate
the actual required bandwidth for the interconnect and also power requirement. There is an implicit assumption that either the peaks won't always line up, or if they do the use case is tolerant enough
to handle it with the help of QoS hardware.

The aggregation is platform specific, so each platform can specify
how the bandwidth is aggregated and adjust accordingly.

That sounds reasonable to me. I would expect that clients which have
a very hard requirement (such as display) would pass the same
bandwidth for both peak and average values. I would also presume that
an implementation would sum up all average requests to get at the
total requested bandwidth needed and use peak values for perhaps
additional margin or so.

[Krishna S] The explanation above makes sense to me given the
implicit assumptions.  Can any part of this be added to the
documentation?

Ok sure. I expect that different SoCs can differ in how they aggregate requests. Will add some more information about that.

3.	In addition to peak_bw and avg_bw  can interconnects support
a floor request on a clock? We need a floor request for clients
which are affected by latency and not that much by bandwidth.
For example cpu is more latency sensitive than bandwidth in
some cases.   So cpu clients set a emc floor based on its
current cpu frequency to satisfy a minimum latency need.

What would be the difference between a floor request and a
request for peak bandwidth corresponding to that floor frequency?
Couldn't the CPU just register a regular bandwidth request to
achieve the same goal? I mean registering a peak bandwidth that
meets the minimum requirements for the needed latency would
ensure that bandwidth never goes below that value, so it would
basically be a floor request. Perhaps this could also be a
special case where peak bandwidth and average bandwidth are
actually the same?

So this depends on the aggregation. In the case above, if we max
all the peak_bw - the peak_bw operates also as a floor request.


[Krishna S]  Understood that we can salvage peak_bw and use it for
floor purpose.  But the name is very misleading, if we endup using it
for floor purpose.  Hence the request for explicit floor value and
the platforms can either use peak_bw or floor_bw or both.

I think this depends of the point of view. I was not able to come up with something more generic than this, but i am open to suggestions.

4.	Request to have tracing as a debug option. On every
icc_set() call print the path and aggregated avg bw value.

This could presumably be done with ftrace and would be, in my opinion, a good addition to this framework.

This is a good idea!

a.	We also want to know what the request from every client is,
at a given instant, so that we can add testcase to ensure the
emc calculation code is doing the right thing.   Automated
tests can make use of this.

Would a debugfs interface be useful for this? I could imagine
that something similar to the common clock framework (per-client
file with current requests and an additional bandwidth_summary
file with a list of all requests and a total perhaps) would work
very well for this.

There is already an interconnect_summary file, which lists all the requests from drivers with requested/aggregated values.

Looks exactly like what I had in mind. Krishna, Sanjay, does that
file look like something that we could use for testing? Would we need
additional information in it?

[Krishna S] Yes, if this exists that would be exactly what we need.
If the requested value includes all the fields (client id, avg_bw,
peak_bw, etc) and the final calculated value by the provider that is
all the information we need.

Ok, great!

Thanks,
Georgi



[Index of Archives]     [ARM Kernel]     [Linux ARM]     [Linux ARM MSM]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux