On Wed, Aug 8, 2018 at 4:58 PM Huang Rui <ray.huang at amd.com> wrote: > On Wed, Aug 08, 2018 at 03:10:07PM +0800, Koenig, Christian wrote: > > Yeah that is a known issue, but this solution is not correct either. > > > > See the scheduler where the job is execute on is simply not determined > > yet when we want to trace it. > > > > So using the scheduler name from the entity is wrong as well. > > > > We should probably move the reschedule from drm_sched_entity_push_job() > > to drm_sched_job_init() to fix that. > > Could you please explain why move reschedule along can fix the issue. > Seemingly, only s_fence's sched is written to entity rq's sched, it can > avoid the issue. > > sched_job->s_fence->sched = entity->rq->sched > > Because entity->rq->sched may not necessarily be the scheduler on which this job will get scheduled. And assigning a wrong scheduler could lead to wrong dependency optimizations. Hence it was assigned NULL initially we don't know scheduler it will be scheduled on to avoid any wrong optimizations. Cheers, Nayan > Thanks, > Ray > > > > > I will prepare a patch for that today, > > Christian. > > > > Am 08.08.2018 um 09:05 schrieb Huang Rui: > > > We won't initialize fence scheduler in drm_sched_fence_create() > anymore, so it > > > will refer null fence scheduler if open trace event to get the > timeline name. > > > Actually, it is the scheduler name from the entity, so add a macro to > replace > > > legacy getting timeline name by job. > > > > > > [ 212.844281] BUG: unable to handle kernel NULL pointer dereference > at 0000000000000018 > > > [ 212.852401] PGD 8000000427c13067 P4D 8000000427c13067 PUD 4235fc067 > PMD 0 > > > [ 212.859419] Oops: 0000 [#1] SMP PTI > > > [ 212.862981] CPU: 4 PID: 1520 Comm: amdgpu_test Tainted: G > OE 4.18.0-rc1-custom #1 > > > [ 212.872194] Hardware name: Gigabyte Technology Co., Ltd. > Z170XP-SLI/Z170XP-SLI-CF, BIOS F20 11/04/2016 > > > [ 212.881704] RIP: 0010:drm_sched_fence_get_timeline_name+0x2b/0x30 > [gpu_sched] > > > [ 212.888948] Code: 1f 44 00 00 48 8b 47 08 48 3d c0 b1 4f c0 74 13 > 48 83 ef 60 48 3d 60 b1 4f c0 b8 00 00 00 00 48 0f 45 f8 48 8b 87 e0 00 00 > 00 <48> 8b 40 18 c3 0f 1f 44 00 00 b8 01 00 00 00 c3 0f 1f 44 00 00 0f > > > [ 212.908162] RSP: 0018:ffffa3ed81f27af0 EFLAGS: 00010246 > > > [ 212.913483] RAX: 0000000000000000 RBX: 0000000000070034 RCX: > ffffa3ed81f27da8 > > > [ 212.920735] RDX: ffff8f24ebfb5460 RSI: ffff8f24e40d3c00 RDI: > ffff8f24ebfb5400 > > > [ 212.928008] RBP: ffff8f24e40d3c00 R08: 0000000000000000 R09: > ffffffffae4deafc > > > [ 212.935263] R10: ffffffffada000ed R11: 0000000000000001 R12: > ffff8f24e891f898 > > > [ 212.942558] R13: 0000000000000000 R14: ffff8f24ebc46000 R15: > ffff8f24e3de97a8 > > > [ 212.949796] FS: 00007ffff7fd2700(0000) GS:ffff8f24fed00000(0000) > knlGS:0000000000000000 > > > [ 212.958047] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 212.963921] CR2: 0000000000000018 CR3: 0000000423422003 CR4: > 00000000003606e0 > > > [ 212.971201] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > > > [ 212.978482] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > > > [ 212.985720] Call Trace: > > > [ 212.988236] trace_event_raw_event_amdgpu_cs_ioctl+0x4c/0x170 > [amdgpu] > > > [ 212.994904] ? amdgpu_ctx_add_fence+0xa9/0x110 [amdgpu] > > > [ 213.000246] ? amdgpu_job_free_resources+0x4b/0x70 [amdgpu] > > > [ 213.005944] amdgpu_cs_ioctl+0x16d1/0x1b50 [amdgpu] > > > [ 213.010920] ? amdgpu_cs_find_mapping+0xf0/0xf0 [amdgpu] > > > [ 213.016354] drm_ioctl_kernel+0x8a/0xd0 [drm] > > > [ 213.020794] ? recalc_sigpending+0x17/0x50 > > > [ 213.024965] drm_ioctl+0x2d7/0x390 [drm] > > > [ 213.028979] ? amdgpu_cs_find_mapping+0xf0/0xf0 [amdgpu] > > > [ 213.034366] ? do_signal+0x36/0x700 > > > [ 213.037928] ? signal_wake_up_state+0x15/0x30 > > > [ 213.042375] amdgpu_drm_ioctl+0x46/0x80 [amdgpu] > > > > > > Signed-off-by: Huang Rui <ray.huang at amd.com> > > > --- > > > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- > > > drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 10 ++++++---- > > > 2 files changed, 7 insertions(+), 5 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > > > index e12871d..be01e1b 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > > > @@ -1247,7 +1247,7 @@ static int amdgpu_cs_submit(struct > amdgpu_cs_parser *p, > > > > > > amdgpu_job_free_resources(job); > > > > > > - trace_amdgpu_cs_ioctl(job); > > > + trace_amdgpu_cs_ioctl(job, entity); > > > amdgpu_vm_bo_trace_cs(&fpriv->vm, &p->ticket); > > > priority = job->base.s_priority; > > > drm_sched_entity_push_job(&job->base, entity); > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h > > > index 8c2dab2..25cdcb7 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h > > > @@ -36,6 +36,8 @@ > > > > > > #define AMDGPU_JOB_GET_TIMELINE_NAME(job) \ > > > > job->base.s_fence->finished.ops->get_timeline_name(&job->base.s_fence->finished) > > > +#define AMDGPU_GET_SCHED_NAME(entity) \ > > > + (entity->rq->sched->name) > > > > > > TRACE_EVENT(amdgpu_mm_rreg, > > > TP_PROTO(unsigned did, uint32_t reg, uint32_t value), > > > @@ -161,11 +163,11 @@ TRACE_EVENT(amdgpu_cs, > > > ); > > > > > > TRACE_EVENT(amdgpu_cs_ioctl, > > > - TP_PROTO(struct amdgpu_job *job), > > > - TP_ARGS(job), > > > + TP_PROTO(struct amdgpu_job *job, struct drm_sched_entity > *entity), > > > + TP_ARGS(job, entity), > > > TP_STRUCT__entry( > > > __field(uint64_t, sched_job_id) > > > - __string(timeline, > AMDGPU_JOB_GET_TIMELINE_NAME(job)) > > > + __string(timeline, > AMDGPU_GET_SCHED_NAME(entity)) > > > __field(unsigned int, context) > > > __field(unsigned int, seqno) > > > __field(struct dma_fence *, fence) > > > @@ -175,7 +177,7 @@ TRACE_EVENT(amdgpu_cs_ioctl, > > > > > > TP_fast_assign( > > > __entry->sched_job_id = job->base.id; > > > - __assign_str(timeline, > AMDGPU_JOB_GET_TIMELINE_NAME(job)) > > > + __assign_str(timeline, > AMDGPU_GET_SCHED_NAME(entity)) > > > __entry->context = > job->base.s_fence->finished.context; > > > __entry->seqno = > job->base.s_fence->finished.seqno; > > > __entry->ring_name = > to_amdgpu_ring(job->base.sched)->name; > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180808/3adbbb44/attachment.html>