21.02.2020 23:21, Dmitry Osipenko пишет: > 21.02.2020 23:02, Daniel Lezcano пишет: >> On 21/02/2020 19:19, Dmitry Osipenko wrote: >>> 21.02.2020 20:36, Daniel Lezcano пишет: >>>> On Fri, Feb 21, 2020 at 07:56:51PM +0300, Dmitry Osipenko wrote: >>>>> Hello Daniel, >>>>> >>>>> 21.02.2020 18:43, Daniel Lezcano пишет: >>>>>> On Thu, Feb 13, 2020 at 02:51:26AM +0300, Dmitry Osipenko wrote: >>>>>>> It is possible that something may go wrong with the secondary CPU, in that >>>>>>> case it is much nicer to get a dump of the flow-controller state before >>>>>>> hanging machine. >>>>>>> >>>>>>> Acked-by: Peter De Schrijver <pdeschrijver@xxxxxxxxxx> >>>>>>> Tested-by: Peter Geis <pgwipeout@xxxxxxxxx> >>>>>>> Tested-by: Jasper Korten <jja2000@xxxxxxxxx> >>>>>>> Tested-by: David Heidelberg <david@xxxxxxx> >>>>>>> Signed-off-by: Dmitry Osipenko <digetx@xxxxxxxxx> >>>>>>> --- >>>> >>>> [ ... ] >>>> >>>>>>> +static int tegra20_wait_for_secondary_cpu_parking(void) >>>>>>> +{ >>>>>>> + unsigned int retries = 3; >>>>>>> + >>>>>>> + while (retries--) { >>>>>>> + ktime_t timeout = ktime_add_ms(ktime_get(), 500); >>>>>> >>>>>> Oops I missed this one. Do not use ktime_get() in this code path, use jiffies. >>>>> >>>>> Could you please explain what benefits jiffies have over the ktime_get()? >>>> >>>> ktime_get() is very slow, jiffies is updated every tick. >>> >>> But how jiffies are supposed to be updated if interrupts are disabled? >> >> Yeah, other cpus must not be idle in this. > > Okay, then jiffies can't be used here because this function is used for > the coupled / power-gated state only. All CPUs are idling in this state. > >>> Aren't jiffies actually slower than ktime_get() because jiffies are >>> updating every 10/1ms (depending on CONFIG_HZ)? >> >> They are no slower, they have a lower resolution which is 10ms or 4ms. >> >> Given the 500ms timeout, it is fine. >> >>> We're kinda interesting here in getting into deep-idling state as quick >>> as possible. I was checking how much time takes the busy-loop below and >>> it takes ~40-150us in average, which is good enough. >> >> ktime_get() gets a seq lock and it is very slow. > > Since all CPUs are idling here, the locking isn't a problem. > > The wait_for_secondary_cpu_parking() function is called on CPU0, it > waits for the secondary CPUs to enter into safe-state before CPU0 could > power-gate the whole CPU cluster. > >>>>>>> + >>>>>>> + /* >>>>>>> + * The primary CPU0 core shall wait for the secondaries >>>>>>> + * shutdown in order to power-off CPU's cluster safely. >>>>>>> + * The timeout value depends on the current CPU frequency, >>>>>>> + * it takes about 40-150us in average and over 1000us in >>>>>>> + * a worst case scenario. >>>>>>> + */ >>>>>>> + do { >>>>>>> + if (tegra_cpu_rail_off_ready()) >>>>>>> + return 0; >>>>>>> + >>>>>>> + } while (ktime_before(ktime_get(), timeout)); >>>>>> >>>>>> So this loop will aggresively call tegra_cpu_rail_off_ready() and retry 3 >>>>>> times. The tegra_cpu_rail_off_ready() function can be called thoushand of times >>>>>> here but the function will hang 1.5s :/ >>>>>> >>>>>> I suggest something like: >>>>>> >>>>>> while (retries--i && !tegra_cpu_rail_off_ready()) >>>>>> udelay(100); >>>>>> >>>>>> So <retries> calls to tegra_cpu_rail_off_ready() and 100us x <retries> maximum >>>>>> impact. >>>>> But udelay() also results into CPU spinning in a busy-loop, and thus, >>>>> what's the difference? >>>> >>>> busy looping instead of register reads with all the hardware things involved behind. >>> >>> Please notice that this code runs only on an older Cortex-A9/A15, which >>> doesn't support WFE for the delaying, and thus, CPU always busy-loops >>> inside udelay(). >>> >>> What about if I'll add cpu_relax() to the loop? Do you think it it could >>> have any positive effect? >> >> I think udelay() has a call to cpu_relax(). > > Yes, my point is that udelay() doesn't bring much benefit for us here > because: > > 1. we want to enter into power-gated state as quick as possible and > udelay() just adds an unnecessary delay > > 2. udelay() spins in a busy-loop until delay is expired, just like we're > doing it in this function already I'll try the udelay()-loop over the weekend and will see if it makes any real difference, maybe I'm missing something. If it doesn't make any difference, I'll leave this patch as-is, okay?