Am 24.07.24 um 10:16 schrieb Tvrtko Ursulin:
[SNIP]
Absolutely.
Absolutely good and absolutely me, or absolutely you? :)
You, I don't even have time to finish all the stuff I already started :/
These are the TODO points and their opens:
- Adjust amdgpu_ctx_set_entity_priority() to call
drm_sched_entity_modify_sched() regardless of the hw_type - to fix
priority changes on a single sched other than gfx or compute.
Either that or to stop using the scheduler priority to implement
userspace priorities and always use different HW queues for that.
- Document sched_list array lifetime must align with the entity and
adjust the callers.
Open:
Do you still oppose keeping sched_list for num_scheds == 1?
Not if you can fix up all callers.
If so do you propose drm_sched_entity_modify_sched() keeps disagreeing
with drm_sched_entity_init() on this detail? And keep the "one shot
single sched_list" quirk in? Why is that nicer than simply keeping the
list and remove that quirk? Once lifetime rules are clear it IMO is
okay to always keep the list.
Yeah if every caller of drm_sched_entity_init() can be fixed I'm fine
with that as well.
- Remove drm_sched_entity_set_priority().
Open:
Should we at this point also modify amdgpu_device_init_schedulers() to
stop initialising schedulers with DRM_SCHED_PRIORITY_COUNT run queues?
One step at a time.
Once drm_sched_entity_set_priority() is gone there is little point.
Unless there are some non-obvious games with the kernel priority or
something.
In general scheduler priorities were meant to be used for things
like kernel queues which would always have higher priority than
user space submissions and using them for userspace turned out to
be not such a good idea.
Out of curiousity what were the problems? I cannot think of
anything fundamental since there are priorities at the backend
level after all, just fewer levels.
A higher level queue can starve lower level queues to the point
that they never get a chance to get anything running.
Oh that.. well I call that implementation details. :) Because
nowhere in the uapi is I think guaranteed execution ordering needs
to be strictly in descending priority. This potentially goes back to
what you said above that a potential larger rewrite might be
beneficial. Implementing some smarter scheduling. Although the issue
will be it is just frontend scheduling after all. So a bit
questionable to invest in making it too smart.
+1 and we have a bug report complaining that RR is in at least one
situation better than FIFO. So it is actually quite hard to remove.
On the other hand FIFO has some really nice properties as well. E.g.
try to run >100 glxgears instances (on weaker hw) and just visually
compare the result differences between RR and FIFO. FIFO is baby
smooth and RR is basically stuttering all the time.
I think this goes more back to what I was suggesting during early xe
days, that potentially drm scheduler should be split into dependency
handling part and the scheduler part. Drivers with 1:1
entity:scheduler and full hardware/firmware scheduling do not really
need neither fifo or rr.
Yeah that's my thinking as well and I also suggested that multiple
times in discussions with Sima and others.
This basically means that userspace gets a chance to submit
infinity fences with all the bad consequences.
I mean one problem unrelated to this discussion is this:
void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
{
struct dma_fence *fence;
struct drm_gpu_scheduler *sched;
struct drm_sched_rq *rq;
/* queue non-empty, stay on the same engine */
if (spsc_queue_count(&entity->job_queue))
return;
Which makes it look like the entity with a constant trickle of
jobs will never actually manage to change it's run queue. Neither
today, nor after the potential drm_sched_entity_set_priority()
removal.
That's intentional and based on how the scheduler load balancing
works.
I see that it is intentional but if it can silently prevent priority
changes (even hw priority) it is not very solid. Unless I am missing
something here.
IIRC the GSoC student who implemented this (with me as mentor)
actually documented that behavior somewhere.
And to be honest it kind of makes sense because switching priorities
would otherwise be disruptive, e.g. you have a moment were you need
to drain all previous submissions with the old priority before you
can do new ones with the new priority.
Hmmm I don't see how it makes sense. Perhaps a test case for
AMDGPU_SCHED_OP_*_PRIORITY_OVERRIDE is missing to show how it doesn't
work. Or at least how easy it can be defeated with callers none the
wiser.
Ok, that needs a bit longer explanation. You don't by any chance have teams?
For context I am kind of interested because I wired up amdgpu to the
DRM cgroup controller and use priority override to de-prioritize
certain cgroups and it kind of works. But again, it will not be great
if a client with a constant trickle of submissions can just defeat it.
Yeah, exactly that use case is currently not possible :(
Regards,
Christian.
Regards,
Tvrtko