RE: interconnects on Tegra

Krishna Sitaraman <ksitaraman@xxxxxxxxxx> · Thu, 15 Nov 2018 01:17:58 +0000

Thierry, thanks for looping us in. 

    Reading this thread, I believe Peter has given a brief overview of how we manage memory bus clock along with latency & priority information to the memory subsystem for isochronous clients.  I want to add to this:
-	The total amount of isochronous bandwidth achievable is limited and we have to arbitrate the available iso bandwidth at runtime among the clients.  So our tegra framework provides a mechanism for clients to check and then lock a particular iso bandwidth before attempting to switch to the desired mode which uses it.  
-	The tegra framework also provides mechanism for the isochronous clients to set a latency and/or priority information to the memory arbitration hardware. 

   The interconnect framework seems to be a good fit and we might be able to make use of it.  However there are certain additional functionality we want to request or suggest that can help in using the interconnects.

Listing them out here:

1.	Isochronous bandwidth manager needs to provide feedback to the clients (consumers) to know if a particular iso bandwidth request is possible or not before clients can make a definite switch.  Example Display wanting to know if a mode is possible before switch to the new configuration.  The interconnect framework needs a method for provider to give a feedback to the requesting consumer.  A check or is_possible  request before actual set request. 

2.	How is the peak_bw actually defined and what is the intended usage?  Need clarity on this. 
a.	A existing implementation from Qualcomm, seems to do a max of all peak_bw  on their code.  Does this mean that all consumers would not be using their peak_bw at the same time?  Why is it not a sum of all peak_bw.  So this is not clear to us. 

3.	In addition to peak_bw and avg_bw  can interconnects support a floor request on a clock? We need a floor request for clients which are affected by latency and not that much by bandwidth.   For example cpu is more latency sensitive than bandwidth in some cases.   So cpu clients set a emc floor based on its current cpu frequency to satisfy a minimum latency need.  

4.	Request to have tracing as a debug option. On every icc_set() call print the path and aggregated avg bw value. 
a.	We also want to know what the request from every client is, at a given instant, so that we can add testcase to ensure the emc calculation code is doing the right thing.   Automated tests can make use of this.

5.	To support latency & priority programming some chips need to pass additional parameters apart from bandwidth or latency information.  Will the interconnects framework support mechanism to pass a private struct (downstream defined struct) for the set operation?  The private struct can be part of the icc_node, and programmed by the consumer.   This will also help to support any future deviations. 

6.	When will the latency part of the interconnects framework be implemented?  What features is it adding?

Thanks & regards,
Krishna

-----Original Message-----
From: Thierry Reding <thierry.reding@xxxxxxxxx> 
Sent: Wednesday, November 14, 2018 12:36 PM
To: Georgi Djakov <georgi.djakov@xxxxxxxxxx>
Cc: Jonathan Hunter <jonathanh@xxxxxxxxxx>; Peter De Schrijver <pdeschrijver@xxxxxxxxxx>; Dmitry Osipenko <digetx@xxxxxxxxx>; linux-tegra <linux-tegra@xxxxxxxxxxxxxxx>; Krishna Sitaraman <ksitaraman@xxxxxxxxxx>; Sanjay Chandrashekara <sanjayc@xxxxxxxxxx>
Subject: Re: interconnects on Tegra

On Thu, Nov 01, 2018 at 03:06:52PM +0200, Georgi Djakov wrote:
> Hi Thierry & all,
> 
> On 10/29/2018 12:18 PM, Jon Hunter wrote:
> > 
> > On 29/10/2018 10:01, Thierry Reding wrote:
> >> On Fri, Oct 26, 2018 at 06:04:08PM +0300, Georgi Djakov wrote:
> >>> Hi Jon & all
> >>>
> >>> On 10/26/2018 04:48 PM, Jon Hunter wrote:
> >>>> Hi Georgi,
> >>>>
> >>>> On 22/10/2018 17:36, Georgi Djakov wrote:
> >>>>> Hello Jon and Dmitry,
> >>>>>
> >>>>> I am working on API [1] which allows consumer drivers to express 
> >>>>> their bandwidth needs between various SoC components - for 
> >>>>> example from CPU to memory, from video decoders and DSPs etc. 
> >>>>> Then the system can aggregate the needed bandwidth between the 
> >>>>> components and set the on-chip interconnects to the most optimal power/performance profile.
> >>>>>
> >>>>> I was wondering if there is any DVFS management related to 
> >>>>> interconnects on Tegra platforms, as my experience is mostly with Qualcomm hardware.
> >>>>> The reason i am asking is that i want to make sure that the API 
> >>>>> design and the DT bindings would work or at least do not 
> >>>>> conflict with how DFVS is done on Tegra platforms. So do you 
> >>>>> know if there is any bus clock scaling or dynamic interconnect 
> >>>>> configuration done by firmware or software in downstream kernels?
> >>>>>
> >>>>> Thanks,
> >>>>> Georgi
> >>>>>
> >>>>> [1].
> >>>>> 	
> >>>>
> >>>> The downstream kernels do have a bandwidth manager driver for 
> >>>> managing the memory controller speed/latency, however, I am not 
> >>>> sure about the actual internal interconnect itself.
> >>>>
> >>>> Adding the linux-tegra mailing list for visibility.
> >>>>
> >>>
> >>> Thanks! This sounds interesting! I looked at some 4.9 kernel and 
> >>> found references to some bwmgr functions, which look like they can 
> >>> do some dynamic bandwidth scaling. Is the full implementation 
> >>> available publicly? Are there any plans on upstreaming this?
> >>
> >> Cc'ing Peter who's probably the most familiar with all of this. 
> >> We've been discussing this on and off for a while now, and the 
> >> latest concensus was that the existing PM QoS would be a good 
> >> candidate for an API to use for this purpose, albeit maybe not optimal.
> 
> Yes, indeed the PM QoS interface was the closest candidate for 
> extending when looked at this initially. The problem with that 
> approach was that it's not suitable for configuring multi-tiered bus 
> topologies and it might require many changes that might end up 
> conflicting with a lot of the existing stuff.
> 
> >> Generally the way that this works on Tegra is that we have a memory 
> >> bus clock that can be scaled, so we'd need to aggregate all of the 
> >> requests for bandwidth and set a memory clock frequency that allows 
> >> all of those to be met. There are also mechanisms to influence 
> >> latency for certain requests which can be essential to make sure 
> >> isochronous clients work properly under memory pressure. I'm not 
> >> sure we can even get into those situations with the feature set 
> >> available upstream, but it's certainly something that's important 
> >> once we do a lot of GPU, display and multimedia in parallel.
> 
> Thank you! This sounds very similar to the problem i am trying to solve.
> It seems to me that the interconnect API would be a perfect fit for 
> Tegra too. There is a proposal for device-tree binding to describe the 
> path between SoC components and am trying collect more information 
> whether this would be useful for other platforms. If you have any 
> comments, feel free to respond to the discussion [2].
> 
> The general idea is that you use the "interconnects" properties in DT 
> to describe paths that are used by devices. The interconnect API 
> follows the consumer-provider model already used by the clock and 
> regulator frameworks and the usage is similar. Developers need 
> implement platform-specific provider drivers that know the SoC 
> topology and do aggregation and low-level hardware configuration. I am 
> not sure what would be the exact implementation for Tegra platforms, 
> but i expect that it involves changing the rate of some clocks or writing to some registers.
> 
> Thanks,
> Georgi
> 
> [2]. https://lore.kernel.org/lkml/20180925180215.GA12435@bogus/

Hi Georgi,

looping in Krishna and Sanjay who are most familiar with our downstream "interconnect" code. We've discussed this over the past few days and they can provide more detailed feedback on what we currently use and additional requirements that we have and that perhaps could be incorporated in the interconnect framework.

Thierry