Hi Scott: I have read this patch, looks fine to me, and why not add unlikely to cval_old != cval_new? I think this problem is hard to occur. Thanks. Ding On 2016/6/22 9:45, Scott Wood wrote: > On Fri, 2016-05-13 at 11:24 +0100, Marc Zyngier wrote: >> On Thu, 12 May 2016 23:37:39 -0500 >> Scott Wood <oss@xxxxxxxxxxxx> wrote: >> >> Hi Scott, >> >>> Erratum A-008585 says that the ARM generic timer counter "has the >>> potential to contain an erroneous value for a small number of core >>> clock cycles every time the timer value changes". Accesses to TVAL >>> (both read and write) are also affected due to the implicit counter >>> read. Accesses to CVAL are not affected. >>> >>> The workaround is to reread TVAL and count registers until successive >>> reads >>> return the same value, and when writing TVAL to retry until counter >>> reads before and after the write return the same value. >>> >>> This erratum can be found on LS1043A and LS2080A. >>> >>> Signed-off-by: Scott Wood <oss@xxxxxxxxxxxx> >>> --- >>> v2: >>> Significant rework based on feedback, including using static_key, >>> disabling VDSO counter access rather than adding the workaround to the >>> VDSO, and uninlining the loops. >>> >>> Dropped the separate property for indicating that writes to TVAL are >>> affected, as I believe that's just a side effect of the implicit >>> counter read being corrupted, and thus a chip that is affected by one >>> will always be affected by the other. >>> >>> Dropped the arm32 portion as it seems there was confusion about whether >>> LS1021A is affected. Currently I am being told that it is not >>> affected. >>> >>> I considered writing to CVAL rather than looping on TVAL writes, but >>> that would still have required separate set_next_event() code for the >>> erratum, and adding CVAL to the enum would have required a bunch of >>> extra handlers in switch statements (even where unused, due to compiler >>> warnings about unhandled enum values) including in an arm32 header. It >>> seemed better to avoid the arm32 interaction and new untested >>> accessors. >>> --- >>> .../devicetree/bindings/arm/arch_timer.txt | 6 ++ >>> arch/arm64/include/asm/arch_timer.h | 37 +++++-- >>> drivers/clocksource/arm_arch_timer.c | 110 >>> +++++++++++++++++++++ >>> 3 files changed, 144 insertions(+), 9 deletions(-) >>> >>> diff --git a/Documentation/devicetree/bindings/arm/arch_timer.txt >>> b/Documentation/devicetree/bindings/arm/arch_timer.txt >>> index e774128..ef5fbe9 100644 >>> --- a/Documentation/devicetree/bindings/arm/arch_timer.txt >>> +++ b/Documentation/devicetree/bindings/arm/arch_timer.txt >>> @@ -25,6 +25,12 @@ to deliver its interrupts via SPIs. >>> - always-on : a boolean property. If present, the timer is powered >>> through an >>> always-on power domain, therefore it never loses context. >>> >>> +- fsl,erratum-a008585 : A boolean property. Indicates the presence of >>> + QorIQ erratum A-008585, which says that reading the counter is >>> + unreliable unless the same value is returned by back-to-back reads. >>> + This also affects writes to the tval register, due to the implicit >>> + counter read. >>> + >>> ** Optional properties: >>> >>> - arm,cpu-registers-not-fw-configured : Firmware does not initialize >> >> This should be part of a separate patch. Also, errata should be >> documented in Documentation/arm64/silicon-errata.txt. > > Oh right, forgot. > >>> +extern struct static_key_false arch_timer_read_ool_enabled; >>> + >>> +#define ARCH_TIMER_REG_READ(reg, func) \ >>> +extern u64 func##_ool(void); \ >>> +static inline u64 __##func(void) \ >>> +{ \ >>> + u64 val; \ >>> + asm volatile("mrs %0, " reg : "=r" (val)); \ >>> + return val; \ >>> +} \ >>> +static inline u64 _##func(void) \ >>> +{ \ >>> + if (static_branch_unlikely(&arch_timer_read_ool_enabled)) \ >>> + return func##_ool(); \ >>> + else \ >>> + return __##func(); \ >>> +} >>> + >>> +ARCH_TIMER_REG_READ("cntp_tval_el0", arch_timer_get_ptval) >>> +ARCH_TIMER_REG_READ("cntv_tval_el0", arch_timer_get_vtval) >>> +ARCH_TIMER_REG_READ("cntvct_el0", arch_counter_get_cntvct) >>> + >> >> Given that this will have a (small) impact on non-affected platforms, >> it'd be good to have this guarded by a erratum-specific config option >> (CONFIG_FSL_ERRATUM_008585?) and turn it into trivial accessors when not >> defined. > > OK. > >> >>> /* >>> * These register accessors are marked inline so the compiler can >>> * nicely work out which register we want, and chuck away the rest of >>> @@ -66,19 +89,19 @@ u32 arch_timer_reg_read_cp15(int access, enum >>> arch_timer_reg reg) >>> if (access == ARCH_TIMER_PHYS_ACCESS) { >>> switch (reg) { >>> case ARCH_TIMER_REG_CTRL: >>> - asm volatile("mrs %0, cntp_ctl_el0" : "=r" >>> (val)); >>> + asm volatile("mrs %0, cntp_ctl_el0" : "=r" >>> (val)); >> >> Spurious change? > > The extra spacing seemed to be an attempt to get things to line up between the > CTRL and TVAL asm statements. When the TVAL case was converted to a function > call, there was nothing for the above to line up with, so I moved it back to > normal spacing. > >>> +{ >>> + u64 cval_old, cval_new; >>> + int timeout = 200; >> >> Can we have a comment on how this value has been chosen? > > It's an arbitrary value well beyond the point at which we've seen it fail. > >>> @@ -232,6 +274,50 @@ static __always_inline void set_next_event(const int >>> access, unsigned long evt, >>> arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk); >>> } >>> >>> +#ifdef CONFIG_ARM64 >>> +static __always_inline void rewrite_tval(const int access, >>> + unsigned long evt, struct clock_event_device *clk) >>> +{ >>> + u64 cval_old, cval_new; >>> + int timeout = 200; >>> + >>> + do { >>> + cval_old = __arch_counter_get_cntvct(); >>> + arch_timer_reg_write(access, ARCH_TIMER_REG_TVAL, evt, >>> clk); >>> + cval_new = __arch_counter_get_cntvct(); >> >> Don't you need to guarantee the order of accesses here? > > I'm not 100% sure. The erratum workaround sample code doesn't show any > barriers, and adding more barriers could make it harder for the loop to > successfully complete. There's already a barrier after the write, so the only > concern should be whether the timer read could be reordered after the timer > write, which could cause the loop to exit even if the write was bad. Do you > know if A53 or A57 will reorder a counter read relative to a tval write? > >>> +#endif /* ARM64 */ >>> + >>> static int arch_timer_set_next_event_virt(unsigned long evt, >>> struct clock_event_device *clk) >>> { >>> @@ -277,6 +363,13 @@ static void __arch_timer_setup(unsigned type, >>> clk->set_state_shutdown = >>> arch_timer_shutdown_virt; >>> clk->set_state_oneshot_stopped = >>> arch_timer_shutdown_virt; >>> clk->set_next_event = >>> arch_timer_set_next_event_virt; >>> + >>> +#ifdef CONFIG_ARM64 >>> + if >>> (static_branch_unlikely(&arch_timer_read_ool_enabled)) >>> + clk->set_next_event = >>> + arch_timer_set_next_event_virt_er >>> rata; >> >> On the same line, please. > > I was trying to avoid going beyond 80 columns. > >>> @@ -485,6 +585,13 @@ static void __init arch_counter_register(unsigned >>> type) >>> arch_timer_read_counter = >>> arch_counter_get_cntvct; >>> else >>> arch_timer_read_counter = >>> arch_counter_get_cntpct; >>> + >>> + /* >>> + * Don't use the vdso fastpath if errata require using >>> + * the out-of-line counter accessor. >>> + */ >>> + if (static_branch_unlikely(&arch_timer_read_ool_enabled)) >>> + clocksource_counter.name = >>> "arch_sys_counter_ool"; >> >> This looks really ugly. How about telling the vdso subsystem directly? >> Will, do you have a preference? > > I was following the example set by "arch_mem_counter". > >>> } else { >>> arch_timer_read_counter = arch_counter_get_cntvct_mem; >>> >>> @@ -763,6 +870,9 @@ static void __init arch_timer_of_init(struct >>> device_node *np) >>> >>> arch_timer_c3stop = !of_property_read_bool(np, "always-on"); >>> >>> + if (of_property_read_bool(np, "fsl,erratum-a008585")) >>> + static_branch_enable(&arch_timer_read_ool_enabled); >>> + >>> /* >>> * If we cannot rely on firmware initializing the timer registers >>> then >>> * we should use the physical timers instead. >> >> An outstanding question is how we're going to deal with this in KVM, >> because a guest absolutely needs to know about it (I can definitely see >> time jumping in guests running on a LS2080). > > The property will need to be in the guest's device tree. I'm not too familiar > with how KVM handles device trees on arm... From looking at the QEMU source > it seems that the dtb is passed in by the user. So either that dtb will need > the erratum property in it, or QEMU (and KVM tool?) would need to patch it > into the guest dtb based on seeing the property in /proc/device-tree. > > -Scott > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@xxxxxxxxxxxxxxxxxxx > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > . > -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html