Re: [PATCH 2/2] drm/sched: serialize job_timeout and scheduler

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2021-08-31 12:01 p.m., Luben Tuikov wrote:
On 2021-08-31 11:23, Andrey Grodzovsky wrote:
On 2021-08-31 10:38 a.m., Daniel Vetter wrote:
On Tue, Aug 31, 2021 at 10:20:40AM -0400, Andrey Grodzovsky wrote:
On 2021-08-31 10:03 a.m., Daniel Vetter wrote:
On Tue, Aug 31, 2021 at 09:53:36AM -0400, Andrey Grodzovsky wrote:
It's says patch [2/2] but i can't find patch 1

On 2021-08-31 6:35 a.m., Monk Liu wrote:
tested-by: jingwen chen <jingwen.chen@xxxxxxx>
Signed-off-by: Monk Liu <Monk.Liu@xxxxxxx>
Signed-off-by: jingwen chen <jingwen.chen@xxxxxxx>
---
     drivers/gpu/drm/scheduler/sched_main.c | 24 ++++--------------------
     1 file changed, 4 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index ecf8140..894fdb24 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -319,19 +319,17 @@ static void drm_sched_job_timedout(struct work_struct *work)
     	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
     	/* Protects against concurrent deletion in drm_sched_get_cleanup_job */
+	if (!__kthread_should_park(sched->thread))
+		kthread_park(sched->thread);
+
As mentioned before, without serializing against other TDR handlers from
other
schedulers you just race here against them, e.g. you parked it now but
another
one in progress will unpark it as part of calling  drm_sched_start for other
rings[1]
Unless I am missing something since I haven't found patch [1/2]

[1] - https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fdrivers%2Fgpu%2Fdrm%2Famd%2Famdgpu%2Famdgpu_device.c%23L5041&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C228bd1600c914efe24aa08d96c934bbb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637660202148713283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=PrrvFHAwDeLlbcOctlKHsCFs9%2F56XNVzoLVcT1RoJgI%3D&amp;reserved=0
You need to have your own wq and run all your tdr work on the same wq if
your reset has any cross-engine impact.
IMHO what is problematic in serializing vs. locking (with trylock and bail
out like we do in [1]) is for multiple TO events arising from same reason
like maybe one job just waits for another and once first is hanged the
second will also appear to be hanged triggering it's own TO event.
In this case multiple TOs event will trigger multiple resets if we serialize
but if we use lock with trylock the second one will quietly bail out.

[1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fdrivers%2Fgpu%2Fdrm%2Famd%2Famdgpu%2Famdgpu_device.c%23L4903&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C228bd1600c914efe24aa08d96c934bbb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637660202148713283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=kxSWBoshVTLMMFIFZdPsP4MBgUAoC%2F3szo9GUemSRLY%3D&amp;reserved=0
Hm so I guess a single wq here, that will hold up all other TO. And they
should recheck whether the job is moving meanwhile.
Can you clarify about this ? What job should be moving ? The dependent job ?


Also unless you use hw semaphores the job shouldn't even start before the
deps are singalled, so not sure how this goes wrong?
What about a simple example where
we actually can submit a shader on one ring and a simple
WAIT_REG_MEM packet on another to wait for the shader to write
a specific value to specific memory location. Here you have both of them
started
in close proximity and no explicit dependencies involved (at the
scheduler level)
and yet if the shader hangs also the WAIT_REG_MEM job will hang.


The vm_id flush stuff can make things a bit more fun for your specific
case, but in your specific case you have to run all TO handlers on the
same ordered workqueue anyway (because trying to paper over this in other
ways doesn't work imo).
I didn't get this one.
So, awhile back I tried to "serialize" this by moving timed-out jobs
into their own timed-out-dedicated list, then freeing them asynchronously,
but I never got it to work reliably due to races with low-level drivers and
assumptions made way back.

My idea was to atomic-move timed-out jobs into their own list, at the time of
timeout, and later asynchronously to free them (or better yet, inquire about
their state, and free them or move them back--ideally the inquiry is atomic
and done at timeout time before being moved to the timeout list). Anyway...

But I found out that all these knobs and levers weren't in place and I was
getting problems with it and it never materialized.

The paradigm was loosely "let someone else do it", like, "on an event,
move it to a list, and let someone else handle it", or "on an event, mark
it, and let someone else handle it". (loosely borrowed from an iSCSI target
I did many many years ago--it worked well and there were no races, even with
out-of-order executions.)

If you guys have any ideas to that end, maybe we can try it out.

Regards,
Luben


I wonder if we really need this serialization at all, if we do HW fence embedding at the drm_sched_job level instead of doing it only for amdgpu, and modifying all
the drivers to support this we can both remove this hack and solve the race
against concurrent drm_sched_cleanup_jobs job freeing just by taking reference
to the hw fence of the job at the beginning of drm_sched_job_timedout

Andrey




Andrey


So I think this should all work, no need for tricky cross-scheduler
locking.
-Daniel

Andrey


See

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdri.freedesktop.org%2Fdocs%2Fdrm%2Fgpu%2Fdrm-mm.html%23c.drm_sched_backend_ops&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C228bd1600c914efe24aa08d96c934bbb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637660202148713283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Fpt%2Btho2W4woHKQ861cEbBzoOidS6zuDhFi%2B1UTwWJg%3D&amp;reserved=0

for the ->timeout_job callback docs. I thought I brought this up already?
-Daniel
Yes, this discussion is a continuation of your comment about serializing, I
mentioned before that you proposed it.

Andrey


Andrey


     	spin_lock(&sched->job_list_lock);
     	job = list_first_entry_or_null(&sched->pending_list,
     				       struct drm_sched_job, list);
     	if (job) {
-		/*
-		 * Remove the bad job so it cannot be freed by concurrent
-		 * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread
-		 * is parked at which point it's safe.
-		 */
-		list_del_init(&job->list);
     		spin_unlock(&sched->job_list_lock);
+		/* vendor's timeout_job should call drm_sched_start() */
     		status = job->sched->ops->timedout_job(job);
     		/*
@@ -393,20 +391,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
     	kthread_park(sched->thread);
     	/*
-	 * Reinsert back the bad job here - now it's safe as
-	 * drm_sched_get_cleanup_job cannot race against us and release the
-	 * bad job at this point - we parked (waited for) any in progress
-	 * (earlier) cleanups and drm_sched_get_cleanup_job will not be called
-	 * now until the scheduler thread is unparked.
-	 */
-	if (bad && bad->sched == sched)
-		/*
-		 * Add at the head of the queue to reflect it was the earliest
-		 * job extracted.
-		 */
-		list_add(&bad->list, &sched->pending_list);
-
-	/*
     	 * Iterate the job list from later to  earlier one and either deactive
     	 * their HW callbacks or remove them from pending list if they already
     	 * signaled.



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux