On 1/5/22 7:24 PM, Suzuki K Poulose wrote: > On 05/01/2022 11:16, Anshuman Khandual wrote: >> >> >> On 1/5/22 3:43 PM, Suzuki K Poulose wrote: >>> Hi Anshuman >>> >>> On 05/01/2022 05:05, Anshuman Khandual wrote: >>>> TRBE implementations affected by Arm erratum #2038923 might get TRBE into >>>> an inconsistent view on whether trace is prohibited within the CPU. As a >>>> result, the trace buffer or trace buffer state might be corrupted. This >>>> happens after TRBE buffer has been enabled by setting TRBLIMITR_EL1.E, >>>> followed by just a single context synchronization event before execution >>>> changes from a context, in which trace is prohibited to one where it isn't, >>>> or vice versa. In these mentioned conditions, the view of whether trace is >>>> prohibited is inconsistent between parts of the CPU, and the trace buffer >>>> or the trace buffer state might be corrupted. >>>> >>>> Work around this problem in the TRBE driver by preventing an inconsistent >>>> view of whether the trace is prohibited or not based on TRBLIMITR_EL1.E by >>>> immediately following a change to TRBLIMITR_EL1.E with at least one ISB >>>> instruction before an ERET, or two ISB instructions if no ERET is to take >>>> place. This adds a new cpu errata in arm64 errata framework and also >>>> updates TRBE driver as required. >>>> >>>> Cc: Catalin Marinas <catalin.marinas@xxxxxxx> >>>> Cc: Will Deacon <will@xxxxxxxxxx> >>>> Cc: Mathieu Poirier <mathieu.poirier@xxxxxxxxxx> >>>> Cc: Suzuki Poulose <suzuki.poulose@xxxxxxx> >>>> Cc: coresight@xxxxxxxxxxxxxxxx >>>> Cc: linux-doc@xxxxxxxxxxxxxxx >>>> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx >>>> Cc: linux-kernel@xxxxxxxxxxxxxxx >>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@xxxxxxx> >>>> --- >>>> Documentation/arm64/silicon-errata.rst | 2 + >>>> arch/arm64/Kconfig | 23 ++++++++++ >>>> arch/arm64/kernel/cpu_errata.c | 9 ++++ >>>> arch/arm64/tools/cpucaps | 1 + >>>> drivers/hwtracing/coresight/coresight-trbe.c | 47 +++++++++++++++----- >>>> 5 files changed, 72 insertions(+), 10 deletions(-) >>> >>> As with the previous patch, it may be a good idea to split the >>> patch to arm64 and trbe parts. >> >> Sure, will do. >> >>> >>>> >>>> diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst >>>> index c9b30e6c2b6c..e0ef3e9a4b8b 100644 >>>> --- a/Documentation/arm64/silicon-errata.rst >>>> +++ b/Documentation/arm64/silicon-errata.rst >>>> @@ -54,6 +54,8 @@ stable kernels. >>>> +----------------+-----------------+-----------------+-----------------------------+ >>>> | ARM | Cortex-A510 | #2064142 | ARM64_ERRATUM_2064142 | >>>> +----------------+-----------------+-----------------+-----------------------------+ >>>> +| ARM | Cortex-A510 | #2038923 | ARM64_ERRATUM_2038923 | >>>> ++----------------+-----------------+-----------------+-----------------------------+ >>>> | ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 | >>>> +----------------+-----------------+-----------------+-----------------------------+ >>>> | ARM | Cortex-A53 | #827319 | ARM64_ERRATUM_827319 | >>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig >>>> index 2105b68d88db..026e34fb6fad 100644 >>>> --- a/arch/arm64/Kconfig >>>> +++ b/arch/arm64/Kconfig >>>> @@ -796,6 +796,29 @@ config ARM64_ERRATUM_2064142 >>>> If unsure, say Y. >>>> +config ARM64_ERRATUM_2038923 >>>> + bool "Cortex-A510: 2038923: workaround TRBE corruption with enable" >>>> + depends on CORESIGHT_TRBE >>>> + default y >>>> + help >>>> + This option adds the workaround for ARM Cortex-A510 erratum 2038923. >>>> + >>>> + Affected Cortex-A510 core might cause an inconsistent view on whether trace is >>>> + prohibited within the CPU. As a result, the trace buffer or trace buffer state >>>> + might be corrupted. This happens after TRBE buffer has been enabled by setting >>>> + TRBLIMITR_EL1.E, followed by just a single context synchronization event before >>>> + execution changes from a context, in which trace is prohibited to one where it >>>> + isn't, or vice versa. In these mentioned conditions, the view of whether trace >>>> + is prohibited is inconsistent between parts of the CPU, and the trace buffer or >>>> + the trace buffer state might be corrupted. >>>> + >>>> + Work around this in the driver by preventing an inconsistent view of whether the >>>> + trace is prohibited or not based on TRBLIMITR_EL1.E by immediately following a >>>> + change to TRBLIMITR_EL1.E with at least one ISB instruction before an ERET, or >>>> + two ISB instructions if no ERET is to take place. >>>> + >>>> + If unsure, say Y. >>>> + >>>> config CAVIUM_ERRATUM_22375 >>>> bool "Cavium erratum 22375, 24313" >>>> default y >>>> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c >>>> index cbb7d5a9aee7..60b0c1f1d912 100644 >>>> --- a/arch/arm64/kernel/cpu_errata.c >>>> +++ b/arch/arm64/kernel/cpu_errata.c >>>> @@ -607,6 +607,15 @@ const struct arm64_cpu_capabilities arm64_errata[] = { >>>> ERRATA_MIDR_REV_RANGE(MIDR_CORTEX_A510, 0, 0, 2) >>>> }, >>>> #endif >>>> +#ifdef CONFIG_ARM64_ERRATUM_2038923 >>>> + { >>>> + .desc = "ARM erratum 2038923", >>>> + .capability = ARM64_WORKAROUND_2038923, >>>> + >>>> + /* Cortex-A510 r0p0 - r0p2 */ >>>> + ERRATA_MIDR_REV_RANGE(MIDR_CORTEX_A510, 0, 0, 2) >>>> + }, >>>> +#endif >>>> { >>>> } >>>> }; >>>> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps >>>> index fca3cb329e1d..45a06d36d080 100644 >>>> --- a/arch/arm64/tools/cpucaps >>>> +++ b/arch/arm64/tools/cpucaps >>>> @@ -56,6 +56,7 @@ WORKAROUND_1463225 >>>> WORKAROUND_1508412 >>>> WORKAROUND_1542419 >>>> WORKAROUND_2064142 >>>> +WORKAROUND_2038923 >>>> WORKAROUND_TRBE_OVERWRITE_FILL_MODE >>>> WORKAROUND_TSB_FLUSH_FAILURE >>>> WORKAROUND_TRBE_WRITE_OUT_OF_RANGE >>>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c >>>> index ec24b62b2cec..0689c6dab96d 100644 >>>> --- a/drivers/hwtracing/coresight/coresight-trbe.c >>>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c >>>> @@ -92,11 +92,13 @@ struct trbe_buf { >>>> #define TRBE_WORKAROUND_OVERWRITE_FILL_MODE 0 >>>> #define TRBE_WORKAROUND_WRITE_OUT_OF_RANGE 1 >>>> #define TRBE_WORKAROUND_SYSREG_WRITE_FAILURE 2 >>>> +#define TRBE_WORKAROUND_CORRUPTION_WITH_ENABLE 3 >>>> static int trbe_errata_cpucaps[] = { >>>> [TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE, >>>> [TRBE_WORKAROUND_WRITE_OUT_OF_RANGE] = ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE, >>>> [TRBE_WORKAROUND_SYSREG_WRITE_FAILURE] = ARM64_WORKAROUND_2064142, >>>> + [TRBE_WORKAROUND_CORRUPTION_WITH_ENABLE] = ARM64_WORKAROUND_2038923, >>>> -1, /* Sentinel, must be the last entry */ >>>> }; >>>> @@ -174,6 +176,11 @@ static inline bool trbe_may_fail_sysreg_write(struct trbe_cpudata *cpudata) >>>> return trbe_has_erratum(cpudata, TRBE_WORKAROUND_SYSREG_WRITE_FAILURE); >>>> } >>>> +static inline bool trbe_may_corrupt_with_enable(struct trbe_cpudata *cpudata) >>>> +{ >>> >>> minor nit: trbe_needs_{ctxt_sync, isb}_after_enable() ? >> >> trbe_needs_ctxt_sync_after_enable() sounds better. Also will have to change >> the index above as well .. TRBE_NEEDS_CTXT_SYNC_AFTER_ENABLE. >> >>> >>>> + return trbe_has_erratum(cpudata, TRBE_WORKAROUND_CORRUPTION_WITH_ENABLE); >>>> +} >>>> + >>>> static int trbe_alloc_node(struct perf_event *event) >>>> { >>>> if (event->cpu == -1) >>>> @@ -187,6 +194,30 @@ static inline void trbe_drain_buffer(void) >>>> dsb(nsh); >>>> } >>>> +static inline void set_trbe_enabled(struct trbe_cpudata *cpudata) >>>> +{ >>>> + u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1); >>> >>> minor nit: This implies we do the TRBE programming in the following >>> manner in the common case (i.e, TRBE enabled in the beginning of a >>> session). >>> -> set TRBE LIMIT >>> -> read TRBE LIMIT >>> -> set TRBE ENABLED >>> >>> Could we please optimize this ? I believe the buf->trbe_limit >>> must hold the LIMITR value at any point in time. And thus this >> >> But is not bit risky though ! We have got the following places where >> given trbe_limit instance changes its value. >> >> drivers/../coresight-trbe.c: buf->trbe_limit = buf->trbe_base + nr_pages * PAGE_SIZE; >> drivers/../coresight-trbe.c: buf->trbe_limit = compute_trbe_buffer_limit(handle); >> drivers/../coresight-trbe.c: buf->trbe_limit -= PAGE_SIZE; > > Those are the places where we compute the trbe_limit, *before* > we enable the TRBE. And we don't change recompute the limit > *without disabling* the TRBE. To make it more clear, the > only place where we set TRBE enabled without "computing" > the trbe_limit is when we hit a spurious interrupt. > But the value in the TRBLIMITR should already match the > buf->trbe_limit and we are only going to re-enable the > TRBE with the same limit. The other option is to > pass down the "limit" to the set_trbe_enabled(). Since there are just two instances where set_trbe_enabled() gets called, passing down an additional parameter 'trblimitr' should still be okay. Some additional code change (like the following) will achieve this. Does this look okay ? --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -201,10 +201,8 @@ static inline void trbe_drain_buffer(void) dsb(nsh); } -static inline void set_trbe_enabled(struct trbe_cpudata *cpudata) +static inline void set_trbe_enabled(struct trbe_cpudata *cpudata, u64 trblimitr) { - u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1); - /* * Enable the TRBE without clearing LIMITPTR which * might be required for fetching the buffer limits. @@ -626,7 +624,7 @@ static void set_trbe_limit_pointer_enabled(struct trbe_buf *buf) trblimitr |= (addr & PAGE_MASK); write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1); - set_trbe_enabled(buf->cpudata); + set_trbe_enabled(buf->cpudata, trblimitr); } static void trbe_enable_hw(struct trbe_buf *buf) @@ -1050,13 +1048,14 @@ static int arm_trbe_disable(struct coresight_device *csdev) static void trbe_handle_spurious(struct perf_output_handle *handle) { struct trbe_buf *buf = etm_perf_sink_config(handle); + u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1); /* * If the IRQ was spurious, simply re-enable the TRBE * back without modifying the buffer parameters to * retain the trace collected so far. */ - set_trbe_enabled(buf->cpudata); + set_trbe_enabled(buf->cpudata, trblimitr); } static int trbe_handle_overflow(struct perf_output_handle *handle) > >> >>> function could simply be : >>> >>> set_trbe_enabled(trbe_buf) >>> { >>> limitr = trbe_buf->limit | LIMITR_ENABLE >>> write(limitr, TRBLIMITR_EL1); >>> ... >>> } >> >> Is the potential for performance improvement here, out weigh possible >> risks of using buf->trbe_limit directly while enabling the TRBE ? > > I somehow don't like the fact that we have additional write and read > for the most common case of the TRBE usage (i.e, for arm_trbe_enable()). > If we could avoid that, that may be better. > > Cheers > Suzuki