Re: [PATCH v1 0/4] coresight: ctcu: Enable byte-cntr function for TMC ETR

Jie Gan <quic_jiegan@xxxxxxxxxxx> · Thu, 13 Mar 2025 14:15:42 +0800

On 3/12/2025 9:22 PM, Mike Leach wrote:
Hi,

On Mon, 10 Mar 2025 at 09:05, Jie Gan <quic_jiegan@xxxxxxxxxxx> wrote:

From: Jie Gan <jie.gan@xxxxxxxxxxxxxxxx>

The byte-cntr function provided by the CTCU device is used to transfer data
from the ETR buffer to the userspace. An interrupt is tiggered if the data
size exceeds the threshold set in the BYTECNTRVAL register. The interrupt
handler counts the number of triggered interruptions and the read function
will read the data from the ETR buffer if the IRQ count is greater than 0.
Each successful read process will decrement the IRQ count by 1.

The byte cntr function will start when the device node is opened for reading,
and the IRQ count will reset when the byte cntr function has stopped. When
the file node is opened, the w_offset of the ETR buffer will be read and
stored in byte_cntr_data, serving as the original r_offset (indicating
where reading starts) for the byte counter function.

The work queue for the read operation will wake up once when ETR is stopped,
ensuring that the remaining data in the ETR buffer has been flushed based on
the w_offset read at the time of stopping.

The following shell commands write threshold to BYTECNTRVAL registers.

Only enable byte-cntr for ETR0:
echo 0x10000 > /sys/devices/platform/soc@0/4001000.ctcu/ctcu0/byte_cntr_val

Enable byte-cntr for both ETR0 and ETR1(support both hex and decimal values):
echo 0x10000 4096 > /sys/devices/platform/soc@0/4001000.ctcu/ctcu0/byte_cntr_val

Setting the BYTECNTRVAL registers to 0 disables the byte-cntr function.
Disable byte-cntr for ETR0:
echo 0 > /sys/devices/platform/soc@0/4001000.ctcu/ctcu0/byte_cntr_val

Disable byte-cntr for both ETR0 and ETR1:
echo 0 0 > /sys/devices/platform/soc@0/4001000.ctcu/ctcu0/byte_cntr_val

There is a minimum threshold to prevent generating too many interrupts.
The minimum threshold is 4096 bytes. The write process will fail if user try
to set the BYTECNTRVAL registers to a value less than 4096 bytes(except
for 0).

Finally, the user can read data from the ETR buffer through the byte-cntr file
nodes located under /dev, for example reads data from the ETR0 buffer:
cat /dev/byte-cntr0

Way to enable and start byte-cntr for ETR0:
echo 0x10000 > /sys/devices/platform/soc@0/4001000.ctcu/ctcu0/byte_cntr_val
echo 1 > /sys/bus/coresight/devices/tmc_etr0/enable_sink
echo 1 > /sys/bus/coresight/devices/etm0/enable_source
cat /dev/byte-cntr0

There is a significant issue with attempting to drain an ETR buffer
while it is live in the way you appear to be doing.

You have no way of knowing if the TMC hardware write pointer wraps and
overtakes the point where you are currently reading. This could cause
data corruption as TMC writes as you are reading, or contention for
the buffer that affects the TMC write.

Even if those two events do not occur, then the trace capture sequence
is corrupted.

Take a simple example - suppose we split the buffer into 4 blocks of
trace, which are filled by the ETR

buffer = 1, 2, 3, 4

Now you suppose you have read 1 & 2 into your userspace buffer / file.

file = 1, 2

If there is now some system event that prevents your userspace code
from running for a while, then it is possible that the ETR continues,
wraps and the buffer is now

buffer = 5, 6, 7, 4

Your next two reads will be 7, 4

file = 1, 2, 7, 4

This trace is now corrupt and will cause decode errors. There is no
way for the decoder to determine that the interface between blocks 2 &
7 is not correct. If you are fortunate then this issue will cause an
actual explicit decode error, if you are less fortunate then decode
will continue but in fact be inaccurate, with no obvious way to detect
the inaccuracy.

We encountered this problem early in the development of the perf data
collection. Even though perf was stopping the trace to copy the
hardware buffer, it would concatenate unrelated trace blocks into the
perf userspace buffer, which initially caused decoding errors. This is
now mitigated in perf by marking boundaries and recording indexes of
the boundaries, so the tool can reset the decoder at the start of non
contiguous blocks.

If you do not stop the TMC when draining the ETR buffer, you have no
way of determining if this has occurred.

Clearly using large buffers, split into smaller blocks can mitigate
the possibility of a wrap in this way - but never eliminate it,
especially given the extreme rate that trace data can be generated.

Hi Mike,

Thanks for detailed explanation. It's clear and makes sense to me.

I will look for another reasonable solution.

Thanks,
Jie

Regards

Mike

Jie Gan (4):
   coresight: tmc: Introduce new APIs to get the RWP offset of ETR buffer
   dt-bindings: arm: Add an interrupt property for Coresight CTCU
   coresight: ctcu: Enable byte-cntr for TMC ETR devices
   arm64: dts: qcom: sa8775p: Add interrupts to CTCU device

  .../bindings/arm/qcom,coresight-ctcu.yaml     |  17 +
  arch/arm64/boot/dts/qcom/sa8775p.dtsi         |   5 +
  drivers/hwtracing/coresight/Makefile          |   2 +-
  .../coresight/coresight-ctcu-byte-cntr.c      | 339 ++++++++++++++++++
  .../hwtracing/coresight/coresight-ctcu-core.c |  96 ++++-
  drivers/hwtracing/coresight/coresight-ctcu.h  |  59 ++-
  .../hwtracing/coresight/coresight-tmc-etr.c   |  45 ++-
  drivers/hwtracing/coresight/coresight-tmc.h   |   3 +
  8 files changed, 556 insertions(+), 10 deletions(-)
  create mode 100644 drivers/hwtracing/coresight/coresight-ctcu-byte-cntr.c

--
2.34.1