Re: [RESEND PATCH] arm64: arch_timer: Workaround for Cortex-A73 erratum 858921

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 30/08/17 09:20, Leo Yan wrote:
> Hi Marc,
> 
> On Wed, Aug 30, 2017 at 09:02:47AM +0100, Marc Zyngier wrote:
>> + Mark
>>
>> On 30/08/17 06:06, Leo Yan wrote:
>>> commit fa8d815fac96e7c9247783d5a1f8fa4685b3c543 upstream.
>>>
>>> Cortex-A73 (all versions) counter read can return a wrong value
>>> when the counter crosses a 32bit boundary.
>>>
>>> The workaround involves performing the read twice, and to return
>>> one or the other depending on whether a transition has taken place.
>>>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx>
>>> Signed-off-by: Leo Yan <leo.yan@xxxxxxxxxx>
>>> ---
>>>  arch/arm64/Kconfig                  | 12 ++++++++++++
>>>  arch/arm64/include/asm/arch_timer.h | 11 +++++++++++
>>>  2 files changed, 23 insertions(+)
>>>
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index 14cdc6d..68e7c98 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -374,6 +374,18 @@ config ARM64_ERRATUM_843419
>>>  
>>>  	  If unsure, say Y.
>>>  
>>> +config ARM64_ERRATUM_858921
>>> +	bool "Cortex-A73: 858921: arch timer counter read can return a wrong value"
>>> +	default y
>>> +	depends on ARM_ARCH_TIMER && ARM64
>>> +	help
>>> +	  This option enables a workaround applicable to Cortex-A73
>>> +	  (all versions), whose counter may return incorrect values.
>>> +	  The workaround will be dynamically enabled when an affected
>>> +	  core is detected.
>>> +
>>> +	  If unsure, say Y.
>>> +
>>>  config CAVIUM_ERRATUM_22375
>>>  	bool "Cavium erratum 22375, 24313"
>>>  	default y
>>> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
>>> index fbe0ca3..9b2b0f5 100644
>>> --- a/arch/arm64/include/asm/arch_timer.h
>>> +++ b/arch/arm64/include/asm/arch_timer.h
>>> @@ -114,6 +114,16 @@ static inline u64 arch_counter_get_cntpct(void)
>>>  	return 0;
>>>  }
>>>  
>>> +#ifdef CONFIG_ARM64_ERRATUM_858921
>>> +static inline u64 arch_counter_get_cntvct(void)
>>> +{
>>> +	u64 old, new;
>>> +
>>> +	asm volatile("mrs %0, cntvct_el0" : "=r" (old));
>>> +	asm volatile("mrs %0, cntvct_el0" : "=r" (new));
>>> +	return (((old ^ new) >> 32) & 1) ? old : new;
>>> +}
>>> +#else
>>>  static inline u64 arch_counter_get_cntvct(void)
>>>  {
>>>  	u64 cval;
>>> @@ -123,6 +133,7 @@ static inline u64 arch_counter_get_cntvct(void)
>>>  
>>>  	return cval;
>>>  }
>>> +#endif
>>>  
>>>  static inline int arch_timer_arch_init(void)
>>>  {
>>>
>>
>> This is completely busted:
>>
>> - You are only addressing the kernel side, and ignore userspace (which
>> is just as broken as the kernel).
> 
> Thanks for quick reviewing.
> 
> Yeah, I remembered there have some code directly reads arch timer
> virtual counter from userspace. But can you remind for userspace
> broken issue, is the code in libc or vdso?

You're missing the point. The virtual counter is freely available to
userspace to play with (this is a de-facto ABI). Since it can return bad
values, it needs to be trapped to be correctly emulated (together with
cntfrq_el0), and the VDSO disabled.

> Here I have another question is: after applied the whole workaround
> infrastructure, can it also fix userspace broken issue?

Yup. That's why I ended-up with a 18 patches series, and not just this
single one. Trust me, I'm lazy. There is nothing I hate more than doing
useless work.

>> - You lie in the description of the option (this is in no way dynamic,
>> since you didn't backport the whole workaround infrastructure).
> 
> I should fix it.
> 
>> So I'm afraid I'm NAKing this. Please refrain from blindly backporting
>> random patches. fa8d815fac96e7c9 only makes sense in the context of the
>> whole series, and on its own gives you a very false sense of having
>> properly addressed it.
> 
> IIUC, at least fa8d815fac96e7c9 can fix issue in kernel side, such like
> for sched_clock() roll back issue [1].

What's the point of fixing the kernel if userspace is just as likely to
fail?

> So for this issue, are you suggesting we need backport whole workaround
> infrastructure onto kernel 4.4 and 4.9? Many ARM devices are working
> with these two kernels.

Then these systems are completely broken if they use a Cortex-A73.
Either they run mainline (which will be just fine), or they get a fully
backported workaround infrastructure.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]