Document the introduction and usage of HiSilicon PTT device driver. Signed-off-by: Yicong Yang <yangyicong@xxxxxxxxxxxxx> --- Documentation/trace/hisi-ptt.rst | 326 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 326 insertions(+) create mode 100644 Documentation/trace/hisi-ptt.rst diff --git a/Documentation/trace/hisi-ptt.rst b/Documentation/trace/hisi-ptt.rst new file mode 100644 index 0000000..f093846 --- /dev/null +++ b/Documentation/trace/hisi-ptt.rst @@ -0,0 +1,326 @@ +.. SPDX-License-Identifier: GPL-2.0 + +====================================== +HiSilicon PCIe Tune and Trace device +====================================== + +Introduction +============ + +HiSilicon PCIe tune and trace device (PTT) is a PCIe Root Complex +integrated Endpoint (RCiEP) device, providing the capability +to dynamically monitor and tune the PCIe link's events (tune), +and trace the TLP headers (trace). The two functions are independent, +but is recommended to use them together to analyze and enhance the +PCIe link's performance. + +On Kunpeng 930 SoC, the PCIe Root Complex is composed of several +PCIe cores. Each PCIe core includes several Root Ports and a PTT +RCiEP, like below. The PTT device is capable of tuning and +tracing the link of the PCIe core. +:: + +--------------Core 0-------+ + | | [ PTT ] | + | | [Root Port]---[Endpoint] + | | [Root Port]---[Endpoint] + | | [Root Port]---[Endpoint] + Root Complex |------Core 1-------+ + | | [ PTT ] | + | | [Root Port]---[ Switch ]---[Endpoint] + | | [Root Port]---[Endpoint] `-[Endpoint] + | | [Root Port]---[Endpoint] + +---------------------------+ + +The PTT device driver cannot be loaded if debugfs is not mounted. +Each PTT device will be presented under /sys/kernel/debugfs/hisi_ptt +as its root directory, with name of its BDF number. +:: + + /sys/kernel/debug/hisi_ptt/<domain>:<bus>:<device>.<function> + +Tune +==== + +PTT tune is designed for monitoring and adjusting PCIe link parameters (events). +Currently we support events in 4 classes. The scope of the events +covers the PCIe core to which the PTT device belongs. + +Each event is presented as a file under $(PTT root dir)/$(BDF)/tune, and +mostly a simple open/read/write/close cycle will be used to tune +the event. +:: + $ cd /sys/kernel/debug/hisi_ptt/$(BDF)/tune + $ ls + qos_tx_cpl qos_tx_np qos_tx_p + tx_path_rx_req_alloc_buf_level + tx_path_tx_req_alloc_buf_level + $ cat qos_tx_dp + 1 + $ echo 2 > qos_tx_dp + $ cat qos_tx_dp + 2 + +Current value (numerical value) of the event can be simply read +from the file, and the desired value written to the file to tune. + +1. Tx path QoS control +------------------------ + +The following files are provided to tune the QoS of the tx path of +the PCIe core. + +- qos_tx_cpl: weight of Tx completion TLPs +- qos_tx_np: weight of Tx non-posted TLPs +- qos_tx_p: weight of Tx posted TLPs + +The weight influences the proportion of certain packets on the PCIe link. +For example, for the storage scenario, increase the proportion +of the completion packets on the link to enhance the performance as +more completions are consumed. + +The available tune data of these events is [0, 1, 2]. +Writing a negative value will return an error, and out of range +values will be converted to 2. Note that the event value just +indicates a probable level, but is not precise. + +2. Tx path buffer control +------------------------- + +Following files are provided to tune the buffer of tx path of the PCIe core. + +- tx_path_rx_req_alloc_buf_level: watermark of Rx requested +- tx_path_tx_req_alloc_buf_level: watermark of Tx requested + +These events influence the watermark of the buffer allocated for each +type. Rx means the inbound while Tx means outbound. The packets will +be stored in the buffer first and then posted either when the watermark +reached or when timed out. For a busy direction, you should increase +the related buffer watermark to avoid frequently posting and thus +enhance the performance. In most cases just keep the default value. + +The available tune data of above events is [0, 1, 2]. +Writing a negative value will return an error, and out of range +values will be converted to 2. Note that the event value just +indicates a probable level, but is not precise. + +Trace +===== + +PTT trace is designed for dumping the TLP headers to the memory, which +can be used to analyze the transactions and usage condition of the PCIe +Link. You can choose to filter the traced headers by either requester ID, +or those downstream of a set of Root Ports on the same core of the PTT +device. It's also supported to trace the headers of certain type and of +certain direction. + +In order to start trace, you need to configure the parameters first. +The parameters files are provided under $(PTT root dir)/$(BDF)/trace. +:: + $ cd /sys/kernel/debug/hisi_ptt/$(BDF)/trace + $ ls + free_buffer filter buflet_nums buflet_size + direction type data trace_on + data_format + +1. filter +--------- + +You can configure the filter of TLP headers through the file. The filter +is provided as BDF numbers of either Root Port or subordinates, which +belong to the same PCIe core. You can get the filters available and +currently configured by reading the file, and write the desired BDF to +the file to set the filters. There is no default filter, which means you +must specify at least one filter before starting tracing. Writing an +invalid BDF (not in the available list) will return a failure. +:: + $ echo 0000:80:04.0 > filter + $ cat filter + #### Root Ports #### + 0000:80:00.0 + [0000:80:04.0] + #### Functions #### + 0000:81:00.0 + 0000:81:00.1 + 0000:82:00.0 + +Note that multiple Root Ports can be specified at one time, but only +one Endpoint function can be specified in one trace. Specifying both +Root Port and function at the same time is not supported. + +If no filter is available, reading the filter will get the hint. +:: + $ cat filter + #### No available filter #### + +The filter can be dynamically updated, which means you can always +get correct filter information when hotplug events happen, or +when you manually remove/rescan the devices. + +2. type +------- + +You can trace the TLP headers of certain types by configuring the file. +Reading the file will get available types and current setting, and write +the desired type to the file to configure. The default type is +`posted_request`. Write the types not in the available list will return +a failure. +:: + $ echo completion > type + $ cat type + all posted_request non-posted_request [completion] + +3. direction +------------ + +You can trace the TLP headers from certain direction, which is relative +to the Root Port or the PCIe core. Read the file to get available +directions and current configurition, and write the desired direction +to configure. The default value is `rx` and any invalid direction will +return a failure. Note `rxtx_no_dma_p2p` means the headers of both +directions, but not include P2P DMA access. +:: + $ echo rxtx > direction + $ cat direction + rx tx [rxtx] rxtx_no_dma_p2p + +4. buflet_size +-------------- + +The traced TLP headers will be written to the memory allocated +by the driver. The hardware accepts 4 DMA address with same size, +and writes the buflet sequentially like below. If DMA addr 3 is +finished and the trace is still on, it will return to addr 0. +Driver will allocate each DMA buffer (we call it buflet). +The finished buflet will be replaced with a new one, so +a long time trace can be achieved. +:: + +->[DMA addr 0]->[DMA addr 1]->[DMA addr 2]->[DMA addr 3]-+ + +---------------------------------------------------------+ + +You should both configure the buflet_size and buflet_nums to +configure the `trace buffer` to receive the TLP headers. The +total trace buffer size is buflet_size * buflet_nums. Note +that the trace buffer will not be allocated immediately after you +configure the parameters, but will be allocated right before +the trace starts. + +This file configures the buflet size. Read the file will get +available buflet size and size set currently; write the desired +size to the file to configure. The default size is 2 MiB and any +invalid size written will return a failure. +:: + $ cat buflet_size + [2 MiB] 4 MiB + $ echo 4 > buflet_size + $ cat buflet_size + 2 MiB [4 MiB] + +5. buflet_nums +-------------- + +You can write the desired buflet count to the file to configure, +and read the file to get current buflet count. The default +value is 64. Any positive value is valid. Note that big value +may lead to DMA memory allocation failure, and you will not be +able to start tracing. If it happens, you should consider adjusting +buflet_nums or buflet_size. +:: + $ cat buflet_nums + 64 + $ echo 128 > buflet_nums + $ cat buflet_nums + 128 + +6. data +------- + +The file to access the traced data. You can read the file to get the +binary blob of traced TLP headers. The format of the headers is +4 Dword length and is just as defined by the PCIe Spec r4.0, +Sec 2.2.4.1, or 8 Dword length with additional 4 Dword extra +information. + +echo "" > data will free all the trace buffers allocated as well as +the traced datas. + +7. trace_on +----------- + +Start or end the trace by writing to the file, and monitor the trace +status by reading the file. +:: + $ echo 1 > trace_on # start trace + $ cat trace_on # get the trace status + 1 + $ echo 0 > trace_on # manually end trace + +The read value of the trace_on will be auto cleared if the buffer +allocated is full. 1 indicates the trace is running and 0 for +stopped. Writing any non-zero value to the file starts tracing. + +8. free_buffer +-------------- + +File to indicate the trace buffer status and to manually free the +trace buffer. The read value of 1 indicates the trace buffer has +been allocated and exists in the memory, while 0 indicates there +is no buffer allocated. Write 1 to the file to free the trace +buffer as well as the traced data. +:: + $ cat free_buffer + 1 # indicate the buffer exists + $ echo 1 > free_buffer # free the trace buffer + $ cat free_buffer + 0 + +9. data_format +-------------- + +File to indicate the format of the traced TLP headers. User can +specify the desired format of traced TLP headers. Available formats +are 4DW, 8DW which indicates the length of each TLP headers traced. +:: + $ cat data_format + [4DW] 8DW + $ echo 8 > data_format + $ cat data_format + 4DW [8DW] + +The traced TLP header format is different from the PCIe standard. + +When using the 8DW data format, the entire TLP header is logged +(Header DW0-3 shown below). For example, the TLP header for Memory +Reads with 64-bit addresses is shown in PCIe r5.0, Figure 2-17; +the header for Configuration Requests is shown in Figure 2.20, etc. + +In addition, 8DW trace buffer entries contain a timestamp and +possibly a prefix for a PASID TLP prefix (see Figure 6-20). +Otherwise this field will be all 0. + +The bit[31:11] of DW0 is always 0x1fffff, which can be +used to distinguish the data format. 8DW format is like +:: + bits [ 31:11 ][ 10:0 ] + |---------------------------------------|-------------------| + DW0 [ 0x1fffff ][ Reserved (0x7ff) ] + DW1 [ Prefix ] + DW2 [ Header DW0 ] + DW3 [ Header DW1 ] + DW4 [ Header DW2 ] + DW5 [ Header DW3 ] + DW6 [ Reserved (0x0) ] + DW7 [ Time ] + +When using the 4DW data format, DW0 of the trace buffer entry +contains selected fields of DW0 of the TLP, together with a +timestamp. DW1-DW3 of the trace buffer entry contain DW1-DW3 +directly from the TLP header. + +4DW format is like +:: + bits [31:30] [ 29:25 ][24][23][22][21][ 20:11 ][ 10:0 ] + |-----|---------|---|---|---|---|-------------|-------------| + DW0 [ Fmt ][ Type ][T9][T8][TH][SO][ Length ][ Time ] + DW1 [ Header DW1 ] + DW2 [ Header DW2 ] + DW3 [ Header DW3 ] -- 2.8.1