Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

Christian König <ckoenig.leichtzumerken@xxxxxxxxx> · Wed, 24 Jul 2024 13:16:11 +0200

Am 24.07.24 um 10:16 schrieb Tvrtko Ursulin:
[SNIP]

Absolutely.

Absolutely good and absolutely me, or absolutely you? :)

You, I don't even have time to finish all the stuff I already started :/

These are the TODO points and their opens:

- Adjust amdgpu_ctx_set_entity_priority() to call 
drm_sched_entity_modify_sched() regardless of the hw_type - to fix 
priority changes on a single sched other than gfx or compute.

Either that or to stop using the scheduler priority to implement 
userspace priorities and always use different HW queues for that.

- Document sched_list array lifetime must align with the entity and 
adjust the callers.

Open:

Do you still oppose keeping sched_list for num_scheds == 1?

Not if you can fix up all callers.

If so do you propose drm_sched_entity_modify_sched() keeps disagreeing 
with drm_sched_entity_init() on this detail? And keep the "one shot 
single sched_list" quirk in? Why is that nicer than simply keeping the 
list and remove that quirk? Once lifetime rules are clear it IMO is 
okay to always keep the list.

Yeah if every caller of drm_sched_entity_init() can be fixed I'm fine 
with that as well.

- Remove drm_sched_entity_set_priority().

Open:

Should we at this point also modify amdgpu_device_init_schedulers() to 
stop initialising schedulers with DRM_SCHED_PRIORITY_COUNT run queues?

One step at a time.

Once drm_sched_entity_set_priority() is gone there is little point. 
Unless there are some non-obvious games with the kernel priority or 
something.

In general scheduler priorities were meant to be used for things 
like kernel queues which would always have higher priority than 
user space submissions and using them for userspace turned out to 
be not such a good idea.

Out of curiousity what were the problems? I cannot think of 
anything fundamental since there are priorities at the backend 
level after all, just fewer levels.

A higher level queue can starve lower level queues to the point 
that they never get a chance to get anything running.

Oh that.. well I call that implementation details. :) Because 
nowhere in the uapi is I think guaranteed execution ordering needs 
to be strictly in descending priority. This potentially goes back to 
what you said above that a potential larger rewrite might be 
beneficial. Implementing some smarter scheduling. Although the issue 
will be it is just frontend scheduling after all. So a bit 
questionable to invest in making it too smart.

+1 and we have a bug report complaining that RR is in at least one 
situation better than FIFO. So it is actually quite hard to remove.

On the other hand FIFO has some really nice properties as well. E.g. 
try to run >100 glxgears instances (on weaker hw) and just visually 
compare the result differences between RR and FIFO. FIFO is baby 
smooth and RR is basically stuttering all the time.

I think this goes more back to what I was suggesting during early xe 
days, that potentially drm scheduler should be split into dependency 
handling part and the scheduler part. Drivers with 1:1 
entity:scheduler and full hardware/firmware scheduling do not really 
need neither fifo or rr.

Yeah that's my thinking as well and I also suggested that multiple 
times in discussions with Sima and others.

This basically means that userspace gets a chance to submit 
infinity fences with all the bad consequences.

I mean one problem unrelated to this discussion is this:

void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
{
    struct dma_fence *fence;
    struct drm_gpu_scheduler *sched;
    struct drm_sched_rq *rq;

    /* queue non-empty, stay on the same engine */
    if (spsc_queue_count(&entity->job_queue))
        return;

Which makes it look like the entity with a constant trickle of 
jobs will never actually manage to change it's run queue. Neither 
today, nor after the potential drm_sched_entity_set_priority() 
removal.

That's intentional and based on how the scheduler load balancing 
works.

I see that it is intentional but if it can silently prevent priority 
changes (even hw priority) it is not very solid. Unless I am missing 
something here.

IIRC the GSoC student who implemented this (with me as mentor) 
actually documented that behavior somewhere.

And to be honest it kind of makes sense because switching priorities 
would otherwise be disruptive, e.g. you have a moment were you need 
to drain all previous submissions with the old priority before you 
can do new ones with the new priority.

Hmmm I don't see how it makes sense. Perhaps a test case for 
AMDGPU_SCHED_OP_*_PRIORITY_OVERRIDE is missing to show how it doesn't 
work. Or at least how easy it can be defeated with callers none the 
wiser.

Ok, that needs a bit longer explanation. You don't by any chance have teams?

For context I am kind of interested because I wired up amdgpu to the 
DRM cgroup controller and use priority override to de-prioritize 
certain cgroups and it kind of works. But again, it will not be great 
if a client with a constant trickle of submissions can just defeat it.

Yeah, exactly that use case is currently not possible :(

Regards,
Christian.

Regards,

Tvrtko