Am 28.10.21 um 19:26 schrieb Andrey Grodzovsky:
On 2021-10-27 3:58 p.m., Andrey Grodzovsky wrote:
On 2021-10-27 10:50 a.m., Christian König wrote:
Am 27.10.21 um 16:47 schrieb Andrey Grodzovsky:
On 2021-10-27 10:34 a.m., Christian König wrote:
Am 27.10.21 um 16:27 schrieb Andrey Grodzovsky:
[SNIP]
Let me please know if I am still missing some point of yours.
Well, I mean we need to be able to handle this for all drivers.
For sure, but as i said above in my opinion we need to change
only for those drivers that don't use the _locked version.
And that absolutely won't work.
See the dma_fence is a contract between drivers, so you need the
same calling convention between all drivers.
Either we always call the callback with the lock held or we always
call it without the lock, but sometimes like that and sometimes
otherwise won't work.
Christian.
I am not sure I fully understand what problems this will cause but
anyway, then we are back to irq_work. We cannot embed irq_work as
union within dma_fenc's cb_list
because it's already reused as timestamp and as rcu head after the
fence is signaled. So I will do it within drm_scheduler with single
irq_work per drm_sched_entity
as we discussed before.
That won't work either. We free up the entity after the cleanup
function. That's the reason we use the callback on the job in the
first place.
Yep, missed it.
We could overlead the cb structure in the job though.
I guess, since no one else is using this member it after the cb
executed.
Andrey
Attached a patch. Give it a try please, I tested it on my side and
tried to generate the right conditions to trigger this code path by
repeatedly submitting commands while issuing GPU reset to stop the
scheduler and then killing command submissions process in the middle.
But for some reason looks like the job_queue was always empty already
at the time of entity kill.
It was trivial to trigger with the stress utility I've hacked together:
amdgpu_stress -b v 1g -b g 1g -c 1 2 1g 1k
Then while it is copying just cntrl+c to kill it.
The patch itself is:
Tested-by: Christian König <christian.koenig@xxxxxxx>
Reviewed-by: Christian König <christian.koenig@xxxxxxx>
Thanks,
Christian.
Andrey
Christian.
Andrey
Andrey