[SNIP]The handling of NULL values is half-baked. In my opinion, you should define if drm_sched_pick_best() may put a NULL into rq. If your answer is yes, it might put a NULL there; then, there should be a BUG_ON(!entity->rq) after the invocation of drm_sched_entity_select_rq(). If your answer is no, the BUG_ON() should be in drm_sched_pick_best().Yeah good point. We might not want a BUG_ON(), that is only justified when we prevent further damage (e.g. random data corruption or similar). I suggest using a WARN(!shed, "Submission without activated sheduler!"). This way the system has at least a chance of survival should the scheduler become ready later on. On the other hand the BUG_ON() or the NULL pointer deref should only kill the application thread which is submitting something before the driver is resumed. So that might help to pinpoint where the actually issue is.As I see it the BUG_ON() would just be a more pretty NULL pointer deref. If we agree that this is effectively a misuse of the scheduler API we probably want to add it to make it more pretty, though?
The only alternative I can see is that the scheduler API gracefully handles submits to non-ready schedulers. E.g. that drm_sched_entity_push_job() detects this condition and instead of pushing the job sets and error code and signals the fences.
But that might not be a good idea.
It just moves the crash from one place to another and in general I fully agree the driver is misusing the scheduler API to do something which won't work and potentially crash the whole system.
@Philipp: BTW, I only just discovered this thread by coincidence. Please use get_maintainer. The scheduler currently has 4 maintainers, and none of them is on CC.
Oh good, point I was already wondering why nobody else commented and didn't realized that nobody was on CC.
Thanks,
Christian.
Danke, P.Regards, Christian.That helps guys with zero domain knowledge, like me, to figure out how this is all supposed to work. best regards, Philipp