Re: [PATCH v9 01/15] asm-generic: add barrier smp_cond_load_relaxed_timeout()

Ankur Arora <ankur.a.arora@xxxxxxxxxx> · Thu, 14 Nov 2024 16:28:12 -0800

Catalin Marinas <catalin.marinas@xxxxxxx> writes:

> On Fri, Nov 08, 2024 at 11:41:08AM -0800, Christoph Lameter (Ampere) wrote:
>> On Thu, 7 Nov 2024, Ankur Arora wrote:
>> > > Calling the clock retrieval function repeatedly should be fine and is
>> > > typically done in user space as well as in kernel space for functions that
>> > > need to wait short time periods.
>> >
>> > The problem is that you might have multiple CPUs polling in idle
>> > for prolonged periods of time. And, so you want to minimize
>> > your power/thermal envelope.
>>
>> On ARM that maps to YIELD which does not do anything for the power
>> envelope AFAICT. It switches to the other hyperthread.
>
> The issue is not necessarily arm64 but poll_idle() on other
> architectures like x86 where, at the end of this series, they still call
> cpu_relax() in a loop and check local_clock() every 200 times or so
> iterations. So I wouldn't want to revert the improvement in 4dc2375c1a4e
> ("cpuidle: poll_state: Avoid invoking local_clock() too often").
>
> I agree that the 200 iterations here it's pretty random and it was
> something made up for poll_idle() specifically and it could increase the
> wait period in other situations (or other architectures).
>
> OTOH, I'm not sure we want to make this API too complex if the only
> user for a while would be poll_idle(). We could add a comment that the
> timeout granularity can be pretty coarse and architecture dependent (200
> cpu_relax() calls in one deployment, 100us on arm64 with WFE).

Yeah, agreed. Not worth over engineering this interface at least not
until there are other users. For now I'll just add a comment mentioning
that the time-check is only coarse grained and architecture dependent.

--
ankur