Re: [PATCH] drm/sched: Fix amdgpu crash upon suspend/resume

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 13.01.25 um 09:43 schrieb Philipp Stanner:
[SNIP]

        

        
The handling of NULL values is half-baked.

In my opinion, you should define if drm_sched_pick_best() may put a
NULL into
rq. If your answer is yes, it might put a NULL there; then, there
should be a
BUG_ON(!entity->rq) after the invocation of
drm_sched_entity_select_rq().
If your answer is no, the BUG_ON() should be in
drm_sched_pick_best().
Yeah good point.

We might not want a BUG_ON(), that is only justified when we prevent 
further damage (e.g. random data corruption or similar).

I suggest using a WARN(!shed, "Submission without activated
sheduler!"). 
This way the system has at least a chance of survival should the 
scheduler become ready later on.

On the other hand the BUG_ON() or the NULL pointer deref should only 
kill the application thread which is submitting something before the 
driver is resumed. So that might help to pinpoint where the actually 
issue is.
As I see it the BUG_ON() would just be a more pretty NULL pointer
deref. If we agree that this is effectively a misuse of the scheduler
API we probably want to add it to make it more pretty, though?

The only alternative I can see is that the scheduler API gracefully handles submits to non-ready schedulers. E.g. that drm_sched_entity_push_job() detects this condition and instead of pushing the job sets and error code and signals the fences.

But that might not be a good idea.

It just moves the crash from one place to another and in general I fully agree the driver is misusing the scheduler API to do something which won't work and potentially crash the whole system.

@Philipp:
BTW, I only just discovered this thread by coincidence. Please use
get_maintainer. The scheduler currently has 4 maintainers, and none of
them is on CC.

Oh good, point I was already wondering why nobody else commented and didn't realized that nobody was on CC.

Thanks,
Christian.


Danke,
P.

Regards,
Christian.

That helps guys with zero domain knowledge, like me, to figure out
how
this is all
supposed to work.

best regards,
  Philipp

      

    


[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux