Re: [PATCH] drm/scheduler: fix race condition in load balancer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 14.01.20 um 17:20 schrieb Nirmoy:

On 1/14/20 5:01 PM, Christian König wrote:
Am 14.01.20 um 16:43 schrieb Nirmoy Das:
Jobs submitted in an entity should execute in the order those jobs
are submitted. We make sure that by checking entity->job_queue in
drm_sched_entity_select_rq() so that we don't loadbalance jobs within
an entity.

But because we update entity->job_queue later in drm_sched_entity_push_job(), there remains a open window when it is possibe that entity->rq might get
updated by drm_sched_entity_select_rq() which should not be allowed.

NAK, concurrent calls to drm_sched_job_init()/drm_sched_entity_push_job() are not allowed in the first place or otherwise we mess up the fence sequence order and risk memory corruption.
if I am not missing something, I don't see any lock securing drm_sched_job_init()/drm_sched_entity_push_job() calls in amdgpu_cs_submit().

See one step up in the call chain, function amdgpu_cs_ioctl().

This is locking the page tables, which also makes access to the context and entities mutual exclusive:
        r = amdgpu_cs_parser_bos(&parser, data);
...
        r = amdgpu_cs_submit(&parser, cs);

out:

And here the page tables are unlocked again:
        amdgpu_cs_parser_fini(&parser, r, reserved_buffers);

Regards,
Christian.



Regards,

Nirmoy


_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux