On 21/02/2020 19:19, Dmitry Osipenko wrote: > 21.02.2020 20:36, Daniel Lezcano пишет: >> On Fri, Feb 21, 2020 at 07:56:51PM +0300, Dmitry Osipenko wrote: >>> Hello Daniel, >>> >>> 21.02.2020 18:43, Daniel Lezcano пишет: >>>> On Thu, Feb 13, 2020 at 02:51:26AM +0300, Dmitry Osipenko wrote: >>>>> It is possible that something may go wrong with the secondary CPU, in that >>>>> case it is much nicer to get a dump of the flow-controller state before >>>>> hanging machine. >>>>> >>>>> Acked-by: Peter De Schrijver <pdeschrijver@xxxxxxxxxx> >>>>> Tested-by: Peter Geis <pgwipeout@xxxxxxxxx> >>>>> Tested-by: Jasper Korten <jja2000@xxxxxxxxx> >>>>> Tested-by: David Heidelberg <david@xxxxxxx> >>>>> Signed-off-by: Dmitry Osipenko <digetx@xxxxxxxxx> >>>>> --- >> >> [ ... ] >> >>>>> +static int tegra20_wait_for_secondary_cpu_parking(void) >>>>> +{ >>>>> + unsigned int retries = 3; >>>>> + >>>>> + while (retries--) { >>>>> + ktime_t timeout = ktime_add_ms(ktime_get(), 500); >>>> >>>> Oops I missed this one. Do not use ktime_get() in this code path, use jiffies. >>> >>> Could you please explain what benefits jiffies have over the ktime_get()? >> >> ktime_get() is very slow, jiffies is updated every tick. > > But how jiffies are supposed to be updated if interrupts are disabled? Yeah, other cpus must not be idle in this. > Aren't jiffies actually slower than ktime_get() because jiffies are > updating every 10/1ms (depending on CONFIG_HZ)? They are no slower, they have a lower resolution which is 10ms or 4ms. Given the 500ms timeout, it is fine. > We're kinda interesting here in getting into deep-idling state as quick > as possible. I was checking how much time takes the busy-loop below and > it takes ~40-150us in average, which is good enough. ktime_get() gets a seq lock and it is very slow. >>>>> + >>>>> + /* >>>>> + * The primary CPU0 core shall wait for the secondaries >>>>> + * shutdown in order to power-off CPU's cluster safely. >>>>> + * The timeout value depends on the current CPU frequency, >>>>> + * it takes about 40-150us in average and over 1000us in >>>>> + * a worst case scenario. >>>>> + */ >>>>> + do { >>>>> + if (tegra_cpu_rail_off_ready()) >>>>> + return 0; >>>>> + >>>>> + } while (ktime_before(ktime_get(), timeout)); >>>> >>>> So this loop will aggresively call tegra_cpu_rail_off_ready() and retry 3 >>>> times. The tegra_cpu_rail_off_ready() function can be called thoushand of times >>>> here but the function will hang 1.5s :/ >>>> >>>> I suggest something like: >>>> >>>> while (retries--i && !tegra_cpu_rail_off_ready()) >>>> udelay(100); >>>> >>>> So <retries> calls to tegra_cpu_rail_off_ready() and 100us x <retries> maximum >>>> impact. >>> But udelay() also results into CPU spinning in a busy-loop, and thus, >>> what's the difference? >> >> busy looping instead of register reads with all the hardware things involved behind. > > Please notice that this code runs only on an older Cortex-A9/A15, which > doesn't support WFE for the delaying, and thus, CPU always busy-loops > inside udelay(). > > What about if I'll add cpu_relax() to the loop? Do you think it it could > have any positive effect? I think udelay() has a call to cpu_relax(). -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog