Add documentation about the preemption feature supported by the msm driver. Signed-off-by: Antonino Maniscalco <antomani103@xxxxxxxxx> --- Documentation/gpu/msm-preemption.rst | 99 ++++++++++++++++++++++++++++++++++++ 1 file changed, 99 insertions(+) diff --git a/Documentation/gpu/msm-preemption.rst b/Documentation/gpu/msm-preemption.rst new file mode 100644 index 0000000000000000000000000000000000000000..e9b5d7bc7dc6c7740a0e8ba75d3a74e9ad267b90 --- /dev/null +++ b/Documentation/gpu/msm-preemption.rst @@ -0,0 +1,99 @@ +.. SPDX-License-Identifier: GPL-2.0 + +:orphan: + +============= +MSM Preemtion +============= + +Preemption allows Adreno GPUs to switch to an higher priority ring when work is +pushed to it, reducing latency for high priority submissions. + +When preemption is enabled 4 rings are initialized, corresponding to different +priority levels. Having multiple rings is purely a software concept as the GPU +only has registers to keep track of one graphics ring. +The kernel is able to switch which ring is currently being processed by +requesting preemption. When certain conditions are met, depending on the +priority level, the GPU will save its current state in a series of buffers, +then restores state from a similar set of buffers specified by the kernel. It +then resumes execution and fires an IRQ to let the kernel know the context +switch has completed. + +This mechanism can be used by the kernel to switch between rings. Whenever a +submission occurs the kernel finds the highest priority ring which isn't empty +and preempts to it if said ring is not the one being currently executed. This is +also done whenever a submission completes to make sure execution resumes on a +lower priority ring when a higher priority ring is done. + +Preemption levels +----------------- + +Preemption can only occur at certain boundaries. The exact conditions can be +configured by changing the preemption level, this allows to compromise between +latency (ie. the time that passes between when the kernel requests preemption +and when the SQE begins saving state) and overhead (the amount of state that +needs to be saved). + +The GPU offers 3 levels: + +Level 0 + Preemption only occurs at the submission level. This requires the least amount + of state to be saved as the execution of userspace submitted IBs is never + interrupted, however it offers very little benefit compared to not enabling + preemption of any kind. + +Level 1 + Preemption occurs at either bin level, if using GMEM rendering, or draw level + in the sysmem rendering case. + +Level 2 + Preemption occurs at draw level. + +Level 1 is the mode that is used by the msm driver. + +Additionally the GPU allows to specify a `skip_save_restore` option. This +disables the saving and restoring of all registers except those relating to the +operation of the SQE itself, reducing overhead. Saving and restoring is only +skipped when using GMEM with Level 1 preemption. When enabling this userspace is +expected to set the state that isn't preserved whenever preemption occurs which +is done by specifying preamble and postambles. Those are IBs that are executed +before and after preemption. + +Preemption buffers +------------------ + +A series of buffers are necessary to store the state of rings while they are not +being executed. There are different kinds of preemption records and most of +those require one buffer per ring. This is because preemption never occurs +between submissions on the same ring, which always run in sequence when the ring +is active. This means that only one context per ring is effectively active. + +SMMU_INFO + This buffer contains info about the current SMMU configuration such as the + ttbr0 register. The SQE firmware isn't actually able to save this record. + As a result SMMU info must be saved manually from the CP to a buffer and the + SMMU record updated with info from said buffer before triggering + preemption. + +NON_SECURE + This is the main preemption record where most state is saved. It is mostly + opaque to the kernel except for the first few words that must be initialized + by the kernel. + +SECURE + This saves state related to the GPU's secure mode. + +NON_PRIV + The intended purpose of this record is unknown. The SQE firmware actually + ignores it and therefore msm doesn't handle it. + +COUNTER + This record is used to save and restore performance counters. + +Handling the permissions of those buffers is critical for security. All but the +NON_PRIV records need to be inaccessible from userspace, so they must be mapped +in the kernel address space with the MSM_BO_MAP_PRIV flag. +For example, making the NON_SECURE record accessible from userspace would allow +any process to manipulate a saved ring's RPTR which can be used to skip the +execution of some packets in a ring and execute user commands with higher +privileges. -- 2.46.1