22.02.2020 00:11, Daniel Lezcano пишет: > On 21/02/2020 21:54, Dmitry Osipenko wrote: >> 21.02.2020 23:48, Daniel Lezcano пишет: >>> On 21/02/2020 21:21, Dmitry Osipenko wrote: >>>> 21.02.2020 23:02, Daniel Lezcano пишет: >>> >>> [ ... ] >>> >>>>>>>>>> + >>>>>>>>>> + /* >>>>>>>>>> + * The primary CPU0 core shall wait for the secondaries >>>>>>>>>> + * shutdown in order to power-off CPU's cluster safely. >>>>>>>>>> + * The timeout value depends on the current CPU frequency, >>>>>>>>>> + * it takes about 40-150us in average and over 1000us in >>>>>>>>>> + * a worst case scenario. >>>>>>>>>> + */ >>>>>>>>>> + do { >>>>>>>>>> + if (tegra_cpu_rail_off_ready()) >>>>>>>>>> + return 0; >>>>>>>>>> + >>>>>>>>>> + } while (ktime_before(ktime_get(), timeout)); >>>>>>>>> >>>>>>>>> So this loop will aggresively call tegra_cpu_rail_off_ready() and retry 3 >>>>>>>>> times. The tegra_cpu_rail_off_ready() function can be called thoushand of times >>>>>>>>> here but the function will hang 1.5s :/ >>>>>>>>> >>>>>>>>> I suggest something like: >>>>>>>>> >>>>>>>>> while (retries--i && !tegra_cpu_rail_off_ready()) >>>>>>>>> udelay(100); >>>>>>>>> >>>>>>>>> So <retries> calls to tegra_cpu_rail_off_ready() and 100us x <retries> maximum >>>>>>>>> impact. >>>>>>>> But udelay() also results into CPU spinning in a busy-loop, and thus, >>>>>>>> what's the difference? >>>>>>> >>>>>>> busy looping instead of register reads with all the hardware things involved behind. >>>>>> >>>>>> Please notice that this code runs only on an older Cortex-A9/A15, which >>>>>> doesn't support WFE for the delaying, and thus, CPU always busy-loops >>>>>> inside udelay(). >>>>>> >>>>>> What about if I'll add cpu_relax() to the loop? Do you think it it could >>>>>> have any positive effect? >>>>> >>>>> I think udelay() has a call to cpu_relax(). >>>> >>>> Yes, my point is that udelay() doesn't bring much benefit for us here >>>> because: >>>> >>>> 1. we want to enter into power-gated state as quick as possible and >>>> udelay() just adds an unnecessary delay >>>> >>>> 2. udelay() spins in a busy-loop until delay is expired, just like we're >>>> doing it in this function already >>> >>> In this case why not remove ktime_get() and increase the number of retries? >> >> Because the busy-loop performance depends on CPU's frequency, so we >> can't rely on a bare number of the retries. > > Why not if computed in the worst case scenario? There are always at least a few dozens of microseconds to wait, so something like udelay(10) should be a bit better variant anyways. > Anyway, I'll let you give a try. Turned out that udelay(10) is noticeably better when system is running on a lower freqs in comparison to ktime_get(). I'll switch to udelay in v10, thank you very much for the suggestion!