Re: [RFC PATCH 00/10] Xe DRM scheduler and long running workload plans

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 05.04.23 um 11:07 schrieb Daniel Vetter:
[SNIP]
I would approach it from the complete other side. This component here is a
tool to decide what job should run next.

How that is then signaled and run should not be part of the scheduler, but
another more higher level component.

This way you also don't have a problem with not using DMA-fences as
dependencies as well as constrains for running more jobs.
I think we're talking about two things here and mixing them up.

For the dependencies I agree with you, and imo that higher level tool
should probably just be an on-demand submit thread in userspace for the
rare case where the kernel would need to sort out a dependency otherwise
(due to running out of ringspace in the per-ctx ringbuffer).

The other thing is the message passing stuff, and this is what I was
talking about above. This has nothing to do with handling dependencies,
but with talking to the gpu fw. Here the intel design issue is that the fw
only provides a single queue, and it's in-order. Which means it
fundamentally has the stalling issue you describe as a point against a
message passing design. And fundamentally we need to be able to talk to
the fw in the scheduler ->run_job callback.

The proposal here for the message passing part is that since it has the
stalling issue already anyway, and the scheduler needs to be involved
anyway, it makes sense to integrated this (as an optional thing, only for
drivers which have this kind of fw interface) into the scheduler.
Otherwise you just end up with two layers for no reason and more ping-pong
delay because the ->run_job needs to kick off the subordinate driver layer
first. Note that for this case the optional message passing support in the
drm/scheduler actually makes things better, because it allows you to cut
out one layer.

Of course if a driver with better fw interface uses this message passing
support, then that's bad. Hence the big warning in the kerneldoc.

Well what I wanted to say is that if you design the dependency handling / scheduler properly you don't need the message passing through it.

For example if the GPU scheduler component uses a work item to do it's handling instead of a kthread you could also let the driver specify the work queue where this work item is executed on.

When you design it like this the driver specifies the thread context of execution for it's job. In other words it can specify a single threaded firmware work queue as well.

When you then have other messages which needs to be passed to the firmware you can also use the same single threaded workqueue for this.

Drivers which have a different firmware interface would just use one of the system work queues instead.

This approach basically decouples the GPU scheduler component from the message passing functionality.

Regards,
Christian.


-Daniel





[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux