Re: [PATCH 1/8] drm/sched: Convert drm scheduler to use a work queue rather than kthread

Christian König <christian.koenig@xxxxxxx> · Thu, 3 Aug 2023 16:56:07 +0200

Am 03.08.23 um 16:43 schrieb Matthew Brost:
On Thu, Aug 03, 2023 at 11:11:13AM +0100, Tvrtko Ursulin wrote:
On 01/08/2023 21:50, Matthew Brost wrote:
[SNIP]
   	sched->ops = ops;
   	sched->hw_submission_limit = hw_submission;
   	sched->name = name;
+	sched->run_wq = run_wq ? : system_wq;
I still think it is not nice to implicitly move everyone over to the shared
system wq. Maybe even more so with now one at a time execution, since effect
on latency can be even greater.

No one that has a stake in this has pushed back that I can recall. Open
to feedback stakeholders (maintainers of drivers that use the drm
scheduler).

No objections to using the system_wq here. Drivers can still pass in 
their own or simply use system_highpri_wq instead.

Additional to that the system_wq isn't single threaded, it will create 
as much threads as needed to fully utilize all CPUs.

  The i915 doesn't use the DRM scheduler last time I looked.
Has that changed?

Have you considered kthread_work as a backend? Maybe it would work to have
callers pass in a kthread_worker they create, or provide a drm_sched helper
to create one, which would then be passed to drm_sched_init.

That would enable per driver kthread_worker, or per device, or whatever
granularity each driver would want/need/desire.

driver init:
struct drm_sched_worker = drm_sched_create_worker(...);

queue/whatever init:
drm_sched_init(.., worker, ...);

This idea doesn't seem to work for varitey of reasons. Will type it out
if needed but not going to spend time on this unless someone with a
stake raises this as an issue.

Agree completely. kthread_work is for real time workers IIRC.

You could create one inside drm_sched_init if not passed in, which would
keep the behaviour for existing drivers more similar - they would still have
a 1:1 kthread context for their exclusive use.

Part of the idea of a work queue is so a user can't directly create a
kthread via an IOCTL (XE_EXEC_QUEUE_CREATE). What you suggesting exposes
this issue.

Yeah, prevent that is indeed a very good idea.

And I *think* self-re-arming would be less problematic latency wise since
kthread_worker consumes everything queued without relinquishing control and
execution context would be guaranteed not to be shared with random system
stuff.

So this is essentially so we can use a loop? Seems like a lot effort for
what is pure speculation. Again if a stakeholder raises an issue we can
address then.

Instead of a loop what you usually do in the worker is to submit one 
item (if possible) and then re-queue yourself if there is more work to do.

This way you give others chance to run as well and/or cancel the work etc...

Christian.

Matt

Regards,

Tvrtko