Hi,
Le 10/06/2024 à 18:31, Daniel Vetter a écrit :
> On Mon, Jun 10, 2024 at 03:26:53PM +0200, Pierre-Eric Pelloux-Prayer
wrote:
>> v3:
https://lists.freedesktop.org/archives/dri-devel/2024-June/456792.html
>>
>> Changes since v3:
>> * trace device name instead of drm_device primary index
>> * no pointer deref in the TP_printk anymore. Instead the fence
context/seqno
>> are saved in TP_fast_assign
>
> Some high-level comments:
>
> - Quick summary of the what, why and how in the cover letter would be
> great.
Oops, I forgot to copy-paste the cover letter from v3.
Here it is, maybe whe 'Why' is missing? I'll improve it for v5.
----
The main ideas implemented are: trace dependencies between jobs and
identify the GPU running jobs (because 'ring' is not a unique attribute).
Changes from v2:
* dropped all amdgpu changes. The goal here is to make the gpu_scheduler
trace events usable by a tool, without dependencies on driver-specific
events
* dropped the vblank related changes. I'll post a separate series to
cover drm/vblank events later.
* reworked fence printing so it's coherent between all events.
* added a dev_index to let the tools parsing the events which GPU is
running a job. It wasn't needed before the gpu scheduler switch to
workqueues because the queue pid was enough to identify the hardware queue.
* dropped the changes to trace the "why" of a dependency. I implemented
a version based on Sima's suggestion using xa_tag_pointer but I'm not
convinced it's really useful, so I'm dropping it for now.
Open questions:
* assuming the new fence printing format is agreed on,
should we add some code to preserve the old format?
* checkpatch doesn't like the indentation in the last patch, because
everything is vertically aligned to 'TP_fast_assign('. How to best fix it?
WARNING: Statements should start on a tabstop
#82: FILE: drivers/gpu/drm/scheduler/gpu_scheduler_trace.h:80:
+ unsigned long idx;
----
>
> - Link to the userspace, once you have that. At least last time we
chatted
> that was still wip.
Userspace is at
https://gitlab.freedesktop.org/tomstdenis/umr/-/merge_requests/37
Note that most of it also works fine without these patches, but some
features will be lacking:
* job dependencies
* multi-GPU system won't work as expected on 6.8+ kernels
The tool depends on amdgpu, but the part using these events can work on
any driver using gpu_scheduler.
I tried to use it on a RPi3 but couldn't figure out how to get the
system to use v3d :/
I've also opened an issue in gpuvis issue tracker to get more feedback
about these events.
>
> - Maybe most important to make this actually work, work well, and work
> long-term: I think we should clearly commit to these tracepoints being
> stable uapi, and document that by adding a stable tracepoint
section in
> the drm uapi book.
You mean, Documentation/gpu/drm-uapi.rst?
I can send a v5 with an updated cover letter and a new patch updating
the documentation.
Thanks,
Pierre-Eric
>
> And then get acks from a pile of driver maintainers that they really
> think this is a good idea and has a future. Should also help with
> getting good review on the tracepoints themselves.
>
> Otherwise I fear we'll miss the mark again and still force
userspace to
> hand-roll tracing for every driver, or maybe worse, even specific
kernel
> versions.
>
> Cheers, Sima
>
>>
>> Pierre-Eric Pelloux-Prayer (3):
>> drm/sched: add device name to the drm_sched_process_job event
>> drm/sched: cleanup gpu_scheduler trace events
>> drm/sched: trace dependencies for gpu jobs
>>
>> .../gpu/drm/scheduler/gpu_scheduler_trace.h | 97 +++++++++++++++----
>> drivers/gpu/drm/scheduler/sched_entity.c | 8 +-
>> drivers/gpu/drm/scheduler/sched_main.c | 2 +-
>> 3 files changed, 84 insertions(+), 23 deletions(-)
>>
>> --
>> 2.40.1
>>
>