On 18/10/16 16:32, Matthijs van Duin wrote:
*bump*
Sorry, I think this was buried somewhere in my mailbox.
On 16 August 2016 at 15:44, Nishanth Menon <nm@xxxxxx> wrote:
On 08/15/2016 11:44 PM, Matthijs van Duin wrote:
It is worth mentioning that based on tests I've done (on an
omap5-uevm), the clock speed of the async L3 bridge appears to be the
bottleneck for traffic between the Cortex-A15 and the L3 interconnect.
As a result, if the bridge clock divider is adjusted as mandated by
the datasheet, L3-heavy workloads will actually perform noticably
worse at OPP_HIGH (bridge @ 1500/8 = 187.5 MHz) than at OPP_NOM
(bridge @ 1000/4 = 250 MHz).
This is an awkward situation.
It would be nice to know what the actual maximum clock speed permitted
for the async bridge is, to have an alternative OPP_HIGH mode where
the bridge divider is left at /4 and the cpu speed only raised as much
as the async bridge will tolerate. I have however not found this
specified in the datasheet.
Also, what are the risks involved in overclocking the async bridge as
is done currently? Will it adversely affect device lifetime? Result in
silent data corruption? Cause protocol errors resulting in deadlock?
That would be an invalid Soc Operation condition
That doesn't actually answer though
It seems curious to me that this problem presumably affects every
omap5 and dra75x/am572x (and possibly dra72x/am571x?) currently out
there, yet no problems have been encountered (or at least been
attributed to it), even though leaving the divider at /4 would seem to
severely overclock it (at 375 MHz). I have myself not observed any
problems while attempting to flood the interface (from/to TILER).
A rare deadlock may actually be an acceptable performance tradeoff for
some applications, especially since the cortex-A15 itself already
includes rare deadlocks, some with performance-affecting workaround
(801819, https://patchwork.kernel.org/patch/6960921/), some with no
workaround (799271).
On the other hand, random silent corruption or a serious reduction in
device POH would be less likely to be considered acceptable by anyone.
Some clarification from TI would be very welcome here.
Hmm.. This does seem on the first look to be a miss in our configuration - i
recollect in older evil-vendor-production kernel (circa 3.0/3.4 kernels) we
had handled this, but much has changed since then.
Tero is on vacation atm, will have to see how he'd like to handle this.
Any news?
I guess there are at least two ways to fix this:
1) Setup the asynch bridge dividers within the DPLL code for the MPU
DPLL. You probably need a new ti_clk_features flag that gets setup in
SoC setup for OMAP5 and gets checked in the DPLL code, and apply the
bridge dividers as needed. Slightly hackish yes.
2) Or, you could add completely new clock nodes for the bridges which
are children of MPU DPLL. These would automatically adjust their
internal divider based on the parent clock rate changes. This would have
the added benefit of the bridge clock rates being visible as separate
clock entries for debugging purposes. (Assuming dynamic clock rate
changes like this work properly within CCF, this would need some testing.)
Based on some internal discussions, and the fact that the data manual
for DRA7 doesn't mention this feature (even though it is listed in TRM)
my current assumption is that it is not needed on that SoC.
-Tero
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html