On 2017å¹´05æ??03æ?¥ 14:02, Liu, Monk wrote: > You can add ctx as filed of job, but not get reference of it, when you > try to use ctx, just check if ctx == NULL. > > that doesn't work at all... job->ctx will always be non-NULL after > it is initialized, you just refer to a wild pointer after CTX released job->ctx is a **ctx, which could resolve your this concern. > > Another stupid method: > Use idr_for_each_entry(..job->vm->ctx_mgr...) and compare the > job->fence->fence_context with ctx->ring[].entity->fence_context. if > not found, then ctx is freed, otherwise you can do your things for > this ctx. > > 1) fence_context have chance to incorrectly represent the context > behind it, because the number can be used up and will wrapped around > from beginning No, fence_context is unique in global in kernel. Regards, David Zhou > 2) why not just keep CTX alive, which is much simple than this method > > BR > > -----Original Message----- > From: Zhou, David(ChunMing) > Sent: Wednesday, May 03, 2017 12:54 PM > To: Liu, Monk <Monk.Liu at amd.com>; Liu, Monk <Monk.Liu at amd.com>; > Christian König <deathsimple at vodafone.de>; amd-gfx at lists.freedesktop.org > Subject: RE: [PATCH 1/5] drm/amdgpu:keep ctx alive till all job finished > > You can add ctx as filed of job, but not get reference of it, when you > try to use ctx, just check if ctx == NULL. > > Another stupid method: > Use idr_for_each_entry(..job->vm->ctx_mgr...) and compare the > job->fence->fence_context with ctx->ring[].entity->fence_context. if > not found, then ctx is freed, otherwise you can do your things for > this ctx. > > Regards, > David Zhou > > -----Original Message----- > From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On Behalf > Of Liu, Monk > Sent: Wednesday, May 03, 2017 11:57 AM > To: Liu, Monk <Monk.Liu at amd.com>; Christian König > <deathsimple at vodafone.de>; amd-gfx at lists.freedesktop.org > Subject: RE: [PATCH 1/5] drm/amdgpu:keep ctx alive till all job finished > > @Christian, > > The thing is I need mark all entities behind this timeout job as > guilty, so I use a member in entity named "ptr_guilty", to point to > The member in amdgpu_ctx named "guilty", that way read > *entity->ptr_guilty can get you if this entity is "invalid", and set > *entity->ptr_guilty Can let you set all entities belongs to the > context as "invalid". > > Above logic is to guarantee we can kick out all guilty entities in > entity kfifo, and also block all IOCTL with a ctx handle point to this > guilty context, And we only recovery other jobs/entities/context after > scheduler unparked > > If you reject the patch to keep ctx alive till all jobs signaled, > please give me a solution to satisfy above logic > > BR Monk > > -----Original Message----- > From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On Behalf > Of Liu, Monk > Sent: Wednesday, May 03, 2017 11:31 AM > To: Christian König <deathsimple at vodafone.de>; > amd-gfx at lists.freedesktop.org > Subject: RE: [PATCH 1/5] drm/amdgpu:keep ctx alive till all job finished > > 1, This is necessary otherwise how can I access entity pointer after a > job timedout , and why it is dangerous ? > 2, what's the status field in the fences you were referring to ? I > need to judge if it could satisfy my requirement > > > > -----Original Message----- > From: Christian König [mailto:deathsimple at vodafone.de] > Sent: Monday, May 01, 2017 10:48 PM > To: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org > Subject: Re: [PATCH 1/5] drm/amdgpu:keep ctx alive till all job finished > > Am 01.05.2017 um 09:22 schrieb Monk Liu: > > for TDR guilty context feature, we need access ctx/s_entity field > > member through sched job pointer,so ctx must keep alive till all job > > from it signaled. > > NAK, that is unnecessary and quite dangerous. > > Instead we have the designed status field in the fences which should > be checked for that. > > Regards, > Christian. > > > > > Change-Id: Ib87e9502f7a5c8c054c7e56956d7f7ad75998e43 > > Signed-off-by: Monk Liu <Monk.Liu at amd.com> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6 +++++- > > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- > > drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 9 +++++++++ > > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 9 +++++++-- > > drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 6 ------ > > drivers/gpu/drm/amd/scheduler/gpu_scheduler.h | 1 + > > 6 files changed, 23 insertions(+), 10 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > > index e330009..8e031d6 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > > @@ -760,10 +760,12 @@ struct amdgpu_ib { > > uint32_t flags; > > }; > > > > +struct amdgpu_ctx; > > + > > extern const struct amd_sched_backend_ops amdgpu_sched_ops; > > > > int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs, > > - struct amdgpu_job **job, struct amdgpu_vm *vm); > > + struct amdgpu_job **job, struct amdgpu_vm *vm, > struct > > +amdgpu_ctx *ctx); > > int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned > size, > > struct amdgpu_job **job); > > > > @@ -802,6 +804,7 @@ struct amdgpu_ctx_mgr { > > > > struct amdgpu_ctx *amdgpu_ctx_get(struct amdgpu_fpriv *fpriv, > uint32_t id); > > int amdgpu_ctx_put(struct amdgpu_ctx *ctx); > > +struct amdgpu_ctx *amdgpu_ctx_kref_get(struct amdgpu_ctx *ctx); > > > > uint64_t amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct > amdgpu_ring *ring, > > struct fence *fence); > > @@ -1129,6 +1132,7 @@ struct amdgpu_job { > > struct amdgpu_sync sync; > > struct amdgpu_ib *ibs; > > struct fence *fence; /* the hw fence */ > > + struct amdgpu_ctx *ctx; > > uint32_t preamble_status; > > uint32_t num_ibs; > > void *owner; > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > > index 699f5fe..267fb65 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > > @@ -234,7 +234,7 @@ int amdgpu_cs_parser_init(struct > amdgpu_cs_parser *p, void *data) > > } > > } > > > > - ret = amdgpu_job_alloc(p->adev, num_ibs, &p->job, vm); > > + ret = amdgpu_job_alloc(p->adev, num_ibs, &p->job, vm, p->ctx); > > if (ret) > > goto free_all_kdata; > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c > > index b4bbbb3..81438af 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c > > @@ -25,6 +25,13 @@ > > #include <drm/drmP.h> > > #include "amdgpu.h" > > > > +struct amdgpu_ctx *amdgpu_ctx_kref_get(struct amdgpu_ctx *ctx) { > > + if (ctx) > > + kref_get(&ctx->refcount); > > + return ctx; > > +} > > + > > static int amdgpu_ctx_init(struct amdgpu_device *adev, struct > amdgpu_ctx *ctx) > > { > > unsigned i, j; > > @@ -56,6 +63,8 @@ static int amdgpu_ctx_init(struct amdgpu_device > *adev, struct amdgpu_ctx *ctx) > > rq, amdgpu_sched_jobs); > > if (r) > > goto failed; > > + > > + ctx->rings[i].entity.ptr_guilty = &ctx->guilty; /* > kernel entity > > +doesn't have ptr_guilty */ > > } > > > > return 0; > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > > index 690ef3d..208da11 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > > @@ -40,7 +40,7 @@ static void amdgpu_job_timedout(struct > amd_sched_job *s_job) > > } > > > > int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs, > > - struct amdgpu_job **job, struct amdgpu_vm *vm) > > + struct amdgpu_job **job, struct amdgpu_vm *vm, > struct > > +amdgpu_ctx *ctx) > > { > > size_t size = sizeof(struct amdgpu_job); > > > > @@ -57,6 +57,7 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, > unsigned num_ibs, > > (*job)->vm = vm; > > (*job)->ibs = (void *)&(*job)[1]; > > (*job)->num_ibs = num_ibs; > > + (*job)->ctx = amdgpu_ctx_kref_get(ctx); > > > > amdgpu_sync_create(&(*job)->sync); > > > > @@ -68,7 +69,7 @@ int amdgpu_job_alloc_with_ib(struct amdgpu_device > *adev, unsigned size, > > { > > int r; > > > > - r = amdgpu_job_alloc(adev, 1, job, NULL); > > + r = amdgpu_job_alloc(adev, 1, job, NULL, NULL); > > if (r) > > return r; > > > > @@ -94,6 +95,10 @@ void amdgpu_job_free_resources(struct amdgpu_job > *job) > > static void amdgpu_job_free_cb(struct amd_sched_job *s_job) > > { > > struct amdgpu_job *job = container_of(s_job, struct amdgpu_job, > > base); > > + struct amdgpu_ctx *ctx = job->ctx; > > + > > + if (ctx) > > + amdgpu_ctx_put(ctx); > > > > fence_put(job->fence); > > amdgpu_sync_free(&job->sync); > > diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c > > b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c > > index 6f4e31f..9100ca8 100644 > > --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c > > +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c > > @@ -208,12 +208,6 @@ void amd_sched_entity_fini(struct > amd_gpu_scheduler *sched, > > if (!amd_sched_entity_is_initialized(sched, entity)) > > return; > > > > - /** > > - * The client will not queue more IBs during this fini, > consume existing > > - * queued IBs > > - */ > > - wait_event(sched->job_scheduled, > amd_sched_entity_is_idle(entity)); > > - > > amd_sched_rq_remove_entity(rq, entity); > > kfifo_free(&entity->job_queue); > > } > > diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h > > b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h > > index 8cb41d3..ccbbcb0 100644 > > --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h > > +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h > > @@ -49,6 +49,7 @@ struct amd_sched_entity { > > > > struct fence *dependency; > > struct fence_cb cb; > > + bool *ptr_guilty; > > }; > > > > /** > > > _______________________________________________ > amd-gfx mailing list > amd-gfx at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx > _______________________________________________ > amd-gfx mailing list > amd-gfx at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20170503/e0c921c4/attachment-0001.html>