Re: [PATCH RFC 10/18] drm/scheduler: Add can_run_job callback

Christian König <christian.koenig@xxxxxxx> · Wed, 8 Mar 2023 21:14:34 +0100

Am 08.03.23 um 20:45 schrieb Asahi Lina:
On 09/03/2023 04.12, Christian König wrote:
Am 08.03.23 um 20:05 schrieb Asahi Lina:
[SNIP]
Well it's not the better way, it's the only way that works.

I have to admit that my bet on your intentions was wrong, but even that
use case doesn't work correctly.

See when your callback returns false it is perfectly possible that all
hw fences are signaled between returning that information and processing it.

The result would be that the scheduler goes to sleep and never wakes up
again.
That can't happen, because it will just go into another iteration of the
drm_sched main loop since there is an entity available still.

Rather there is probably the opposite bug in this patch: the can_run_job
logic should be moved into the wait_event_interruptible() condition
check, otherwise I think it can end up busy-looping since the condition
itself can be true even when the can_run_job check blocks it.

But there is no risk of it going to sleep and never waking up because
job completions will wake up the waitqueue by definition, and that
happens after the driver-side queues are popped. If this problem could
happen, then the existing hw_submission_limit logic would be broken in
the same way. It is logically equivalent in how it works.

Basically, if properly done in wait_event_interruptible, it is exactly
the logic of that macro that prevents this race condition and makes
everything work at all. Without it, drm_sched would be completely broken.

As I said we exercised those ideas before and yes this approach here
came up before as well and no it doesn't work.
It can never deadlock with this patch as it stands (though it could busy
loop), and if properly moved into the wait_event_interruptible(), it
would also never busy loop and work entirely as intended. The actual API
change is sound.

I don't know why you're trying so hard to convince everyone that this
approach is fundamentally broken... It might be a bad idea for other
reasons, it might encourage incorrect usage, it might not be the best
option, there are plenty of arguments you can make... but you just keep
trying to make an argument that it just can't work at all for some
reason. Why? I already said I'm happy dropping it in favor of the fences...
Well because it is broken.

When you move the check into the wait_event_interruptible condition then
who is going to call wait_event_interruptible when the condition changes?
I think you mean wake_up_interruptible(). That would be
drm_sched_job_done(), on the fence callback when a job completes, which
as I keep saying is the same logic used for
hw_rq_count/hw_submission_limit tracking.
As the documentation to wait_event says:

 * wake_up() has to be called after changing any variable that could
 * change the result of the wait condition.

So what you essentially try to do here is to skip that and say 
drm_sched_job_done() would call that anyway, but when you read any 
variable to determine that state then as far as I can see nothing is 
guarantying that order.
The only other possibility how you could use the callback correctly 
would be to call drm_fence_is_signaled() to query the state of your hw 
submission from the same fence which is then signaled. But then the 
question is once more why you don't give that fence directly to the 
scheduler?
Please think about it for a second,
Yeah, I'm trying to really follow your intentions here. But that doesn't 
really makes sense.
Either you are trying to do something invalid or you are trying to 
circumvent the object model somehow and add a shortcut for the signaling 
API. Both would be more than fishy.
Regards,
Christian.

  it's really not that complicated to
see why it works:

- Driver pops off completed commands <-- can_run_job condition satisfied
- Driver signals fence
  - drm_sched_job_done_cb()
   - drm_sched_job_done()
    - atomic_dec(&sched->hw_rq_count); <-- hw_submission_limit satisfied
    - ...
    - wake_up_interruptible(&sched->wake_up_worker);
       ^- happens after both conditions are potentially satisfied

It really is completely equivalent to just making the hw_rq_count logic
customizable by the driver. The actual flow is the same. As long as the
driver guarantees it satisfies the can_run_job() condition before
signaling the completion fence that triggered that change, it works fine.

As I said this idea came up before and was rejected multiple times.
Maybe it was a different idea, or maybe it was rejected for other
reasons, or maybe it was wrongly rejected for being broken when it isn't ^^

~~ Lina