This patchset implements the basic infrastructure for a new type of V3D job, a CPU job. A CPU job is a job that requires CPU intervention. It would be nice to perform this operations on the kernel space as we can attach multiple in/out syncobjs to it. Why we want a CPU job on the kernel? ==================================== There are some Vulkan commands that cannot be performed by the GPU, so we implement those as CPU jobs on Mesa. But to synchronize a CPU job in the user space, we need to hold part of the command submission flow in order to correctly synchronize their execution. By moving the CPU job to the kernel, we can make use of the DRM schedule queues and all the advantages it brings with it. This way, instead of stalling the submission thread, we can use syncobjs to synchronize the job, providing a more effective management. About the implementation ======================== After we decided that we would like to have a CPU job implementation in the kernel, we could think about two possible implementations for this job: creating an IOCTL for each type of CPU job or using an user extension to provide a polymorphic behavior to a single CPU job IOCTL. We decided for the latter one. We have different types of CPU jobs (indirect CSD jobs, timestamp query jobs, copy query results jobs...) and each of them have a common infrastructure, but perform different operations. Therefore, by using a single IOCTL that is extended by an user extension, we can reuse the common infrastructure - avoiding code repetition - and yet use the user extension ID to identify the type of job and depending on the type of job, perform a certain operation. About the patchset ================== This patchset introduces the basic infrastructure of a CPU job with a new V3D queue (V3D_CPU) e new tracers. Moreover, it introduces six types of CPU jobs: an indirect CSD job, a timestamp query job, a reset timestamp queries job, a copy timestamp query results job, a reset performance queries job, and a copy performance query results job. An indirect CSD job is a job that, when executed in the queue, will map the indirect buffer, read the dispatch parameters, and submit a regular dispatch. So, the CSD job depends on the CPU job execution. We attach the wait dependencies to the CPU job and once they are satisfied, we read the dispatch parameters, rewrite the uniforms (if needed) and enable the CSD job execution, which depends on the completion of the CPU job. A timestamp query job is a job that calculates the value of the timestamp query and updates the availability of the query. In order to implement this job, we had to change the Mesa implementation of the timestamp. Now, the timestamp query value is tracked in a BO, instead of using a memory address. Moreover, the timestamp query availability is tracked with a syncobj, which is signaled when the query is available. A reset timestamp queries job is a job that resets the timestamp queries by zeroing the timestamp BO in the right positions. The right position on the timestamp BO is found through the offset of the first query. A reset performance queries job is a job that zeros the values of the performance monitors associated to that query. Moreover, it resets the availability syncobj related to that query. A copy query results job is a job that copy the results of a query to a BO in a given offset with a given stride. The patchset is divided as such: * #1 - #4: refactoring operations to prepare for the introduction of the CPU job * #5: addressing a vulnerability in the multisync extension * #6: decouple job allocation from job initiation * #7 - #9: introduction of the CPU job * #10 - #11: refactoring operations to prepare for the introduction of the indirect CSD job * #12: introduction of the indirect CSD job * #13: introduction of the timestamp query job * #14: introduction of the reset timestamp queries job * #15: introduction of the copy timestamp query results job * #16: introduction of the reset performance queries job * #17: introduction of the copy performance query results job This patchset has its Mesa counterpart, which is available on [1]. Both the kernel and Mesa implementation were tested with * `dEQP-VK.compute.pipeline.indirect_dispatch.*`, * `dEQP-VK.pipeline.monolithic.timestamp.*`, * `dEQP-VK.synchronization.*`, * `dEQP-VK.query_pool.*` * and `dEQP-VK.multiview.*`. [1] https://gitlab.freedesktop.org/mairacanal/mesa/-/tree/v3dv/v5/cpu-job Changelog ========= v1 -> v2: https://lore.kernel.org/dri-devel/20230904175019.1172713-1-mcanal@xxxxxxxxxx/ * Rebase on top of drm-misc-next. * Add GPU stats to the CPU queue. v2 -> v3: https://lore.kernel.org/dri-devel/20231124012548.772095-1-mcanal@xxxxxxxxxx/ * Don't cast struct v3d_*_job to void *, use &job->base (Iago Toral) * Completely decouple job allocation from initialization (Iago Toral & Melissa Wen) * s/externsion/extension (Iago Toral) * Declare all CPU job functions as static const outside of the function (Iago Toral) * Document how many BOs are expected for each CPU job extension (Iago Toral) * Check if the number of BOs in the IOCTL matches the expectation (Iago Toral) Best Regards, - Maíra Maíra Canal (11): drm/v3d: Don't allow two multisync extensions in the same job drm/v3d: Decouple job allocation from job initiation drm/v3d: Use v3d_get_extensions() to parse CPU job data drm/v3d: Create tracepoints to track the CPU job drm/v3d: Enable BO mapping drm/v3d: Create a CPU job extension for a indirect CSD job drm/v3d: Create a CPU job extension for the timestamp query job drm/v3d: Create a CPU job extension for the reset timestamp job drm/v3d: Create a CPU job extension to copy timestamp query to a buffer drm/v3d: Create a CPU job extension for the reset performance query job drm/v3d: Create a CPU job extension for the copy performance query job Melissa Wen (6): drm/v3d: Remove unused function header drm/v3d: Move wait BO ioctl to the v3d_bo file drm/v3d: Detach job submissions IOCTLs to a new specific file drm/v3d: Simplify job refcount handling drm/v3d: Add a CPU job submission drm/v3d: Detach the CSD job BO setup drivers/gpu/drm/v3d/Makefile | 3 +- drivers/gpu/drm/v3d/v3d_bo.c | 51 ++ drivers/gpu/drm/v3d/v3d_drv.c | 4 + drivers/gpu/drm/v3d/v3d_drv.h | 134 ++- drivers/gpu/drm/v3d/v3d_gem.c | 768 ----------------- drivers/gpu/drm/v3d/v3d_sched.c | 315 +++++++ drivers/gpu/drm/v3d/v3d_submit.c | 1318 ++++++++++++++++++++++++++++++ drivers/gpu/drm/v3d/v3d_trace.h | 57 ++ include/uapi/drm/v3d_drm.h | 240 +++++- 9 files changed, 2110 insertions(+), 780 deletions(-) create mode 100644 drivers/gpu/drm/v3d/v3d_submit.c -- 2.42.0