21.02.2020 20:36, Daniel Lezcano пишет: > On Fri, Feb 21, 2020 at 07:56:51PM +0300, Dmitry Osipenko wrote: >> Hello Daniel, >> >> 21.02.2020 18:43, Daniel Lezcano пишет: >>> On Thu, Feb 13, 2020 at 02:51:26AM +0300, Dmitry Osipenko wrote: >>>> It is possible that something may go wrong with the secondary CPU, in that >>>> case it is much nicer to get a dump of the flow-controller state before >>>> hanging machine. >>>> >>>> Acked-by: Peter De Schrijver <pdeschrijver@xxxxxxxxxx> >>>> Tested-by: Peter Geis <pgwipeout@xxxxxxxxx> >>>> Tested-by: Jasper Korten <jja2000@xxxxxxxxx> >>>> Tested-by: David Heidelberg <david@xxxxxxx> >>>> Signed-off-by: Dmitry Osipenko <digetx@xxxxxxxxx> >>>> --- > > [ ... ] > >>>> +static int tegra20_wait_for_secondary_cpu_parking(void) >>>> +{ >>>> + unsigned int retries = 3; >>>> + >>>> + while (retries--) { >>>> + ktime_t timeout = ktime_add_ms(ktime_get(), 500); >>> >>> Oops I missed this one. Do not use ktime_get() in this code path, use jiffies. >> >> Could you please explain what benefits jiffies have over the ktime_get()? > > ktime_get() is very slow, jiffies is updated every tick. But how jiffies are supposed to be updated if interrupts are disabled? Aren't jiffies actually slower than ktime_get() because jiffies are updating every 10/1ms (depending on CONFIG_HZ)? We're kinda interesting here in getting into deep-idling state as quick as possible. I was checking how much time takes the busy-loop below and it takes ~40-150us in average, which is good enough. >>>> + >>>> + /* >>>> + * The primary CPU0 core shall wait for the secondaries >>>> + * shutdown in order to power-off CPU's cluster safely. >>>> + * The timeout value depends on the current CPU frequency, >>>> + * it takes about 40-150us in average and over 1000us in >>>> + * a worst case scenario. >>>> + */ >>>> + do { >>>> + if (tegra_cpu_rail_off_ready()) >>>> + return 0; >>>> + >>>> + } while (ktime_before(ktime_get(), timeout)); >>> >>> So this loop will aggresively call tegra_cpu_rail_off_ready() and retry 3 >>> times. The tegra_cpu_rail_off_ready() function can be called thoushand of times >>> here but the function will hang 1.5s :/ >>> >>> I suggest something like: >>> >>> while (retries--i && !tegra_cpu_rail_off_ready()) >>> udelay(100); >>> >>> So <retries> calls to tegra_cpu_rail_off_ready() and 100us x <retries> maximum >>> impact. >> But udelay() also results into CPU spinning in a busy-loop, and thus, >> what's the difference? > > busy looping instead of register reads with all the hardware things involved behind. Please notice that this code runs only on an older Cortex-A9/A15, which doesn't support WFE for the delaying, and thus, CPU always busy-loops inside udelay(). What about if I'll add cpu_relax() to the loop? Do you think it it could have any positive effect?