On 21/11/24 10:18AM, Jonathan Cameron wrote:
Add basic documentation to describe the CXL HMU and the perf AUX buffer based interfaces. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> --- Documentation/trace/cxl-hmu.rst | 197 ++++++++++++++++++++++++++++++++ Documentation/trace/index.rst | 1 + 2 files changed, 198 insertions(+) diff --git a/Documentation/trace/cxl-hmu.rst b/Documentation/trace/cxl-hmu.rst new file mode 100644 index 000000000000..f07a50ba608c --- /dev/null +++ b/Documentation/trace/cxl-hmu.rst @@ -0,0 +1,197 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================================== +CXL Hotness Monitoring Unit Driver +================================== + +CXL r3.2 introduced the CXL Hotness Monitoring Unit (CHMU). A CHMU allows +software running on a CXL Host to identify hot memory ranges, that is those with +higher access frequency relative to other memory ranges. + +A given Logical Device (presentation of a CXL memory device seen by a particular +host) can provide 1 or more CHMU each of which supports 1 or more separately +programmable CHMU Instances (CHMUI). These CHMUI are mostly independent with +the exception that there can be restrictions on them tracking the same memory +regions. The CHMUs are always completely independent. +The naming of the units is cxl_hmu_memX.Y.Z where memX matches the naming +of the memory device in /sys/bus/cxl/devices/, Y is the CHMU index and +Z is the CHMUI index with the CHMU. + +Each CHMUI provides a ring buffer structure known as the Hot List from which the +host an read back entries that describe the hotness of particular region of +memory (Hot List Units). The Hot List Unit combines a Unit Address and an access +count for the particular address. Unit address to DPA requires multiplication +by the unit size. Thus, for large unit sizes the device may support higher +counts. It is these Hot List Units that the driver provides via a perf AUX +buffer by copying them from PCI BAR space. + +The unit size at which hotness is measured is configurable for each CHMUI and +all measurement is done in Device Physical Address space. To relate this to +Host Physical Address space the HDM (Host-Managed Device Memory) decoder +configuration must be taken into account to reflect the placement in a +CXL Fixed Memory Window and any interleaving. + +The CHMUI can support interrupts on fills above a watermark, or on overflow +of the hotlist. + +A CHMUI can support two different basic modes of operation. Epoch and +Always On. These affect what is placed on the hotlist. Note that the actual +implementation of tracking is implementation defined and likely to be +inherently imprecise in that the hottest pages may not be discovered due to +resource exhaustion and the hotness counts may not represent accurately how +hot they are. The specification allows for a very high degree of flexibility +in implementation, important as it is likely that a number of different +hardware implementations will be chosen to suit particular silicon and accuracy +budgets. + +Operation and configuration +=========================== + +An example command line is:: + + $perf record -a -e cxl_hmu_mem0.0.0/epoch_type=0,access_type=6,\ + hotness_threshold=1024,epoch_multiplier=4,epoch_scale=4,range_base=0,\ + range_size=1024,randomized_downsampling=0,downsampling_factor=32,\ + hotness_granual=12 + + $perf report --dump-raw-traces
Typo: --dump-raw-trace
+ +which will produce a list of hotlist entries, one per line with a short header +to provide sufficient information to interpret the entries:: + + . ... CXL_HMU data: size 33512 bytes + Header 0: units: 29c counter_width 10 + Header 1 : deadbeef + 0000000000000283 + 0000000000010364 + 0000000000020366 + 000000000003033c + 0000000000040343 + 00000000000502ff + 000000000006030d + 000000000007031a + ... + +The least significant counter_width bits (here 16, hex 10) are the counter +value, all higher bits are the unit index. Multiply by the unit size +to get a Device Physical Address. + +The parameters are as follows: + +epoch_type +---------- + +Two values may be supported:: + + 0 - Epoch based operation + 1 - Always on operation + + +0. Epoch Based Operation +~~~~~~~~~~~~~~~~~~~~~~~~ + +An Epoch is a period of time after which a counter is assessed for hotness. + +The device may have a global sense of an Epoch but it may also operate them on +a per counter, or per region of device basis. This is a function of the +implementation and is not controllable, but is discoverable. In a global Epoch +scheme at start of each Epoch all counters are zeroed / deallocated. Counters +are then allocated in a hardware specific manner and accesses counted. At the +completion of the Epoch the counters are compared with a threshold and entries +with a count above a configurable threshold are added to the hotlist. A new +Epoch is then begun with all counters cleared. + +In non-global Epoch scheme, when the Epoch of a given counter begins is not +specified. An example might be an Epoch for counter only starting on first +touch to the relevant memory region. When a local Epoch ends the counter is +compared to the threshold and if appropriate added to the hotlist. + +Note, in Epoch Based Operation, the counter in the hotlist entry provides +information on how hot the memory is as the counter for the full Epoch is +provided. + +1. Always on Operation +~~~~~~~~~~~~~~~~~~~~~~ + +In this mode, counters may all be reset before enabling the CHMUI. Then +counters are allocated to particular memory units via an hardware specific +method, perhaps on first touch. When a counter passes the configurable +hotness threshold an entry is added to the hotlist and that counter is freed +for reuse. + +In this scheme the count provided in the hotlist entry is not useful as it will +depend only on the configured threshold. + +access_type +----------- + +The parameter controls which access are counted:: + + 1 - Non-TEE read only + 2 - Non-TEE write only + 3 - Non-TEE read and write + 4 - TEE and Non-TEE read only + 5 - TEE and Non-TEE write only + 6 - TEE and Non-tee read and write + + +TEE here refers to a trusted execution environment, specifically one that +results in the T bit being set in the CXL transactions. + + +hotness_granual +--------------- + +Unit size at which tracking is performed. Must be at least 256 bytes but +hardware may only support some sizes. Expressed as a power of 2. e.g. 12 = 4kiB. + +hotness_threshold +----------------- + +This is the minimum counter value that must be reached for the unit to count as +hot and be added to the hotlist. + +The possible range may be dependent on the unit size as a larger unit size +requires more bits on the hotlist entry leaving fewer available for the hotness +counter. + +epoch_multiplier and epoch_scale +-------------------------------- + +The length of an epoch (in epoch mode) is controlled by these two parameters +with the decoded epoch_scale multiplied by the epoch_multiplier to give the +overall epoch length. + +epoch_scale:: + + 1 - 100 usecs + 2 - 1 msec + 3 - 10 msecs + 4 - 100 msecs + 5 - 1 second + +range_base and range_scale +-------------------------- + +Expressed in terms of the unit size set via hotness_granual. Each CHMUI has a +bitmap that controls what Device Physical Address spaces is tracked. Each bit +represents 256MiB of DPA space. + +This interface provides a simple base and size in units of 256MiB to configure +this bitmap. All bits in the specified range will be set. + +downsampling_factor +------------------- + +Hardware may be incapable of counting accesses at full speed or it may be +desirable to count over a longer period during which the counters would +overflow. This control allows selection of a down sampling factor expressed +as a power of 2 between 1 and 32768. Default is minimum supported downsampling +factor. + +randomized_downsampling +----------------------- + +To avoid problems with downsampling when accesses are periodic this option +allows for an implementation defined randomization of the sampling interval, +whilst remaining close to the specified downsampling_factor. diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst index 0b300901fd75..b35ed8e9dfa9 100644 --- a/Documentation/trace/index.rst +++ b/Documentation/trace/index.rst @@ -36,3 +36,4 @@ Linux Tracing Technologies user_events rv/index hisi-ptt + cxl-hmu -- 2.43.0