Hi Maíra, El lun, 27-11-2023 a las 15:48 -0300, Maíra Canal escribió: > This patchset implements the basic infrastructure for a new type of > V3D job, a CPU job. A CPU job is a job that requires CPU > intervention. > It would be nice to perform this operations on the kernel space as we > can attach multiple in/out syncobjs to it. > > Why we want a CPU job on the kernel? > ==================================== > > There are some Vulkan commands that cannot be performed by the GPU, > so > we implement those as CPU jobs on Mesa. But to synchronize a CPU job > in the user space, we need to hold part of the command submission > flow > in order to correctly synchronize their execution. > > By moving the CPU job to the kernel, we can make use of the DRM > schedule queues and all the advantages it brings with it. This way, > instead of stalling the submission thread, we can use syncobjs to > synchronize the job, providing a more effective management. > > About the implementation > ======================== > > After we decided that we would like to have a CPU job implementation > in the kernel, we could think about two possible implementations for > this job: creating an IOCTL for each type of CPU job or using an user > extension to provide a polymorphic behavior to a single CPU job > IOCTL. > We decided for the latter one. > > We have different types of CPU jobs (indirect CSD jobs, timestamp > query jobs, copy query results jobs...) and each of them have a > common > infrastructure, but perform different operations. Therefore, by using > a single IOCTL that is extended by an user extension, we can reuse > the > common infrastructure - avoiding code repetition - and yet use the > user extension ID to identify the type of job and depending on the > type of job, perform a certain operation. > > About the patchset > ================== > > This patchset introduces the basic infrastructure of a CPU job with a > new V3D queue (V3D_CPU) e new tracers. Moreover, it introduces six > types of CPU jobs: an indirect CSD job, a timestamp query job, a > reset timestamp queries job, a copy timestamp query results job, a > reset > performance queries job, and a copy performance query results job. > > An indirect CSD job is a job that, when executed in the queue, will > map the indirect buffer, read the dispatch parameters, and submit a > regular dispatch. So, the CSD job depends on the CPU job execution. > We > attach the wait dependencies to the CPU job and once they are > satisfied, > we read the dispatch parameters, rewrite the uniforms (if needed) and > enable the CSD job execution, which depends on the completion of the > CPU job. > > A timestamp query job is a job that calculates the value of the > timestamp query and updates the availability of the query. In order > to > implement this job, we had to change the Mesa implementation of the > timestamp. Now, the timestamp query value is tracked in a BO, instead > of using a memory address. Moreover, the timestamp query availability > is > tracked with a syncobj, which is signaled when the query is > available. > > A reset timestamp queries job is a job that resets the timestamp > queries by > zeroing the timestamp BO in the right positions. The right position > on > the timestamp BO is found through the offset of the first query. > > A reset performance queries job is a job that zeros the values of the > performance monitors associated to that query. Moreover, it resets > the > availability syncobj related to that query. > > A copy query results job is a job that copy the results of a query to > a > BO in a given offset with a given stride. > > The patchset is divided as such: > * #1 - #4: refactoring operations to prepare for the introduction of > the > CPU job > * #5: addressing a vulnerability in the multisync extension > * #6: decouple job allocation from job initiation > * #7 - #9: introduction of the CPU job > * #10 - #11: refactoring operations to prepare for the introduction > of the > indirect CSD job > * #12: introduction of the indirect CSD job > * #13: introduction of the timestamp query job > * #14: introduction of the reset timestamp queries job > * #15: introduction of the copy timestamp query results job > * #16: introduction of the reset performance queries job > * #17: introduction of the copy performance query results job > > This patchset has its Mesa counterpart, which is available on [1]. > > Both the kernel and Mesa implementation were tested with > > * `dEQP-VK.compute.pipeline.indirect_dispatch.*`, > * `dEQP-VK.pipeline.monolithic.timestamp.*`, > * `dEQP-VK.synchronization.*`, > * `dEQP-VK.query_pool.*` > * and `dEQP-VK.multiview.*`. > > [1] > https://gitlab.freedesktop.org/mairacanal/mesa/-/tree/v3dv/v5/cpu-job > > Changelog > ========= > > v1 -> v2: > https://lore.kernel.org/dri-devel/20230904175019.1172713-1-mcanal@xxxxxxxxxx/ > > * Rebase on top of drm-misc-next. > * Add GPU stats to the CPU queue. > > v2 -> v3: > https://lore.kernel.org/dri-devel/20231124012548.772095-1-mcanal@xxxxxxxxxx/ > > * Don't cast struct v3d_*_job to void *, use &job->base (Iago Toral) > * Completely decouple job allocation from initialization (Iago Toral > & Melissa Wen) > * s/externsion/extension (Iago Toral) > * Declare all CPU job functions as static const outside of the > function (Iago Toral) > * Document how many BOs are expected for each CPU job extension (Iago > Toral) > * Check if the number of BOs in the IOCTL matches the expectation > (Iago Toral) > > Best Regards, > - Maíra > > Maíra Canal (11): > drm/v3d: Don't allow two multisync extensions in the same job > drm/v3d: Decouple job allocation from job initiation > drm/v3d: Use v3d_get_extensions() to parse CPU job data > drm/v3d: Create tracepoints to track the CPU job > drm/v3d: Enable BO mapping > drm/v3d: Create a CPU job extension for a indirect CSD job > drm/v3d: Create a CPU job extension for the timestamp query job > drm/v3d: Create a CPU job extension for the reset timestamp job > drm/v3d: Create a CPU job extension to copy timestamp query to a > buffer > drm/v3d: Create a CPU job extension for the reset performance query > job > drm/v3d: Create a CPU job extension for the copy performance query > job > > Melissa Wen (6): > drm/v3d: Remove unused function header > drm/v3d: Move wait BO ioctl to the v3d_bo file > drm/v3d: Detach job submissions IOCTLs to a new specific file > drm/v3d: Simplify job refcount handling > drm/v3d: Add a CPU job submission > drm/v3d: Detach the CSD job BO setup > > drivers/gpu/drm/v3d/Makefile | 3 +- > drivers/gpu/drm/v3d/v3d_bo.c | 51 ++ > drivers/gpu/drm/v3d/v3d_drv.c | 4 + > drivers/gpu/drm/v3d/v3d_drv.h | 134 ++- > drivers/gpu/drm/v3d/v3d_gem.c | 768 ----------------- > drivers/gpu/drm/v3d/v3d_sched.c | 315 +++++++ > drivers/gpu/drm/v3d/v3d_submit.c | 1318 > ++++++++++++++++++++++++++++++ > drivers/gpu/drm/v3d/v3d_trace.h | 57 ++ > include/uapi/drm/v3d_drm.h | 240 +++++- > 9 files changed, 2110 insertions(+), 780 deletions(-) > create mode 100644 drivers/gpu/drm/v3d/v3d_submit.c > > -- > 2.42.0 I shared a few minor nits but otherwise I think this looks good. With those nits fixed the series is: Reviewed-by: Iago Toral Quiroga <itoral@xxxxxxxxxx> Thanks! Iago