Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture

Ananth Narayan <ananth.narayan@xxxxxxx> · Fri, 23 Sep 2022 21:45:45 +0530

Eric,
The MTA mangled your address. So the note is not showing up on the list.
Responding so this hopefully shows up on lkml for the records.

Apologies to everyone else for duplicates.

Regards,
Ananth

On 23-09-2022 09:31 pm, AMD\ermorton wrote:
> On 2022-09-21 14:15, David Hansen wrote:
> 
>>> Do X86_FEATURE_ZEN CPUs just have unusually painful
>>> inl(acpi_fadt.xpm_tmr_blk.address) implementations?
> 
> Hi David,
> 
> I'm glad you asked this.
> 
> Obviously the words "painful" and "slow" are arbitrary. But... since there are many aspects such as the platform, core clock frequency, system clock frequency, etc, that play into this, I will refrain from any precise numbers.
> 
> I would say that x86 platforms (that today have in excess of a hundred processors) generally design the legacy PM_TMR and other serial resources in the Southbridge/FCH with the underlying assumption that (a) the kernel accesses them "rarely" in non-performance sensitive code and, more importantly, (b) that it is unlikely to have multiple processors access them "simultaneously". These resources are a fair distant from the processor, and unlike memory controllers, these resources were not designed to have multiple simultaneous accesses running in parallel.
> 
> So let's assert that, to start off with, the accesses are already "slow" from the processor standpoint because of this distance. It is likely that most x86 implementations could easily take around 500ns-1us round trip. The exact number will vary, but a quick sanity test on current x86 production platforms match that for a "singleton" access.
> 
> That alone is well over 1000 core clocks and seems to be reason enough to avoid doing this INL when it is not necessary.
> 
> But as the PM_TMR is not designed to handle simultaneous accesses, if multiple processors do simultaneously access this resource (or even "close to simultaneous"), the first access might be "slow", the second access might be "slower", and well, the 100th access might be "painful". And there are interrupt cases where this can indeed happen - due to this ancient workaround...
> 
> Note that a quick sanity test that we created when we understood this tbench data suggested to me that Intel platforms are not immune from the impact of this worst-case access pattern either. This is not surprising to me. But we did not do an exhaustive check.
> 
> Sincerely,
> Eric Morton
> AMD Infinity Fabric and SOC Architecture