On 2017-02-24 03:13 AM, Christian König wrote: > Am 23.02.2017 um 21:48 schrieb Andres Rodriguez: >> >> >> On 2017-02-23 02:46 PM, Andres Rodriguez wrote: >>> >>> >>> On 2017-02-23 03:20 AM, Christian König wrote: >>>> Am 23.02.2017 um 00:47 schrieb Andres Rodriguez: >>>>> This trace is intended to provide the required information to >>>>> associate >>>>> the completion of an amdgpu_job with its corresponding dma_fence_* >>>>> tracepoints. >>>>> >>>>> Signed-off-by: Andres Rodriguez <andresx7 at gmail.com> >>>>> --- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 ++ >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 22 >>>>> ++++++++++++++++++++++ >>>>> 2 files changed, 24 insertions(+) >>>>> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >>>>> index 86a1242..84a04e4 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >>>>> @@ -177,6 +177,8 @@ static struct dma_fence *amdgpu_job_run(struct >>>>> amd_sched_job *sched_job) >>>>> /* if gpu reset, hw fence will be replaced here */ >>>>> dma_fence_put(job->fence); >>>>> job->fence = dma_fence_get(fence); >>>>> + trace_amdgpu_job_attach_fence(job); >>>>> + >>>>> amdgpu_job_free_resources(job); >>>>> return fence; >>>>> } >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h >>>>> index a18ae1e..0a61ed9 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h >>>>> @@ -147,6 +147,28 @@ TRACE_EVENT(amdgpu_sched_run_job, >>>>> __entry->fence, __entry->ring_name, __entry->num_ibs) >>>>> ); >>>>> +TRACE_EVENT(amdgpu_job_attach_fence, >>>>> + TP_PROTO(struct amdgpu_job *job), >>>>> + TP_ARGS(job), >>>>> + TP_STRUCT__entry( >>>>> + __field(struct amdgpu_device *, adev) >>>>> + __field(struct amd_sched_job *, sched_job) >>>> >>>> Neither add adev nor sched_job pointer to your trace point. >>> Would it be okay then to grab a reference to the scheduler fence in >>> amdgpu_job_run and print the sched fence pointer here? >>> >>> That would give us enough info to correlate other trace points like >>> amdgpu_cs_ioctl aor amdgpu_sched_run_job to the respective dma_fence >>> tracepoints. >>> >>> >>>> >>>> The adev pointer is completely meaningless in the logs and the >>>> scheduler job might already be freed when the printk is called. >> Hey Christian, >> >> I was just double checking the lifetime of the scheduler job, and it >> should still be valid during amdgpu_job_run. The lifetime of the job >> is tied to the scheduler finished fence which won't be signaled until >> amdgpu_job_run finishes even if the HW fence has signaled. >> >> This is assuming that the trace printks are synchronous. > > That assumption is incorrect. > > The parameters are stored on a ring buffer when the trace point is > executed, but the string is composed later on when you request the trace > content from userspace. > > Additional to that printing pointers which are heavily reallocated is a > bad idea because once the job is freed the pointer will certainly be > reused for another job. > > Using the numbers from the scheduler fence is the right approach here > and you don't even need to grab another reference, just save the context > and sequence number as identifier for the job. Thanks for the explanation. Using the scheduler fence as you suggested also means that we have the data available on the amdgpu_sched_run_job and we won't require an extra trace point. So that is overall cleaner. I'll send a patch to use that approach instead. I won't be removing anything from amdgpu_sched_run_job though since I'm not sure of the backwards compat rules for trace events. Regards, Andres > > Regards, > Christian. > >> >> So I think printing the job and dropping adev should be okay if the >> above holds true, unless I'm missing something else. >> >>>> >>>> Just checked the source and a couple of other trace points get this >>>> horrible wrong as well, so that isn't your fault. Going to put it on >>>> my todo to fix those. >>>> >>>> Regards, >>>> Christian. >>>> >>>>> + __string(timeline, >>>>> job->fence->ops->get_timeline_name(job->fence)) >>>>> + __field(unsigned int, context) >>>>> + __field(unsigned int, seqno) >>>>> + ), >>>>> + >>>>> + TP_fast_assign( >>>>> + __entry->adev = job->adev; >>>>> + __entry->sched_job = &job->base; >>>>> + __assign_str(timeline, >>>>> job->fence->ops->get_timeline_name(job->fence)) >>>>> + __entry->context = job->fence->context; >>>>> + __entry->seqno = job->fence->seqno; >>>>> + ), >>>>> + TP_printk("adev=%p, sched_job=%p, timeline:%s, context:%u, >>>>> seqno:%u", >>>>> + __entry->adev, __entry->sched_job, >>>>> + __get_str(timeline), __entry->context, __entry->seqno) >>>>> +); >>>>> TRACE_EVENT(amdgpu_vm_grab_id, >>>>> TP_PROTO(struct amdgpu_vm *vm, int ring, struct >>>>> amdgpu_job *job), >>>> >>>> >>> >> >