Re: [PATCH 1/5] drm/sched: Optimise drm_sched_entity_push_job

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxx> · Tue, 15 Oct 2024 14:14:35 +0100

On 15/10/2024 12:38, Philipp Stanner wrote:
On Tue, 2024-10-15 at 09:12 +0100, Tvrtko Ursulin wrote:

On 15/10/2024 08:11, Philipp Stanner wrote:
On Mon, 2024-10-14 at 13:07 +0100, Tvrtko Ursulin wrote:

On 14/10/2024 12:32, Philipp Stanner wrote:
Hi,

On Mon, 2024-10-14 at 11:46 +0100, Tvrtko Ursulin wrote:
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxx>

In FIFO mode We can avoid dropping the lock only to
immediately
re-
acquire
by adding a new drm_sched_rq_update_fifo_locked() helper.

Please write detailed commit messages, as described here [1].
      1. Describe the problem: current state and why it's bad.
      2. Then, describe in imperative (present tense) form what
the
commit
         does about the problem.

Both pieces of info are already there:

1. Drops the lock to immediately re-acquire it.
2. We avoid that by by adding a locked helper.
Optionally, in between can be information about why it's solved
this
way and not another etc.

Applies to the other patches, too.

[1]
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes

Thanks I am new here and did not know this.

Seriosuly, lets not be too blindly strict about this because it
can
get
IMO ridiculous.

One example when I previously accomodated your request is patch
3/5
from
this series:

"""
Current kerneldoc for struct drm_sched_rq incompletely documents
what
fields are protected by the lock.

This is not good because it is misleading.

Lets fix it by listing all the elements which are protected by
the
lock.
"""

While this was the original commit text you weren't happy with:

"""
drm/sched: Re-order struct drm_sched_rq members for clarity

Lets re-order the members to make it clear which are protected by
the
lock
and at the same time document it via kerneldoc.
"""

I maintain the original text was passable.

On top, this was just a respin to accomodate the merge process.
All
approvals were done and dusted couple weeks or so ago so asking
for
yet
another respin for such trivial objections is not great.

I understand that you're unhappy, but please understand the
position
I'm coming from. As you know, since you sent these patches within a
different series (and, thus, since I reviewed them), I was trusted
with
co-maintaining this piece of shared infrastructure.

And since you've worked on it a bit now, I suppose you also know
that
the GPU Scheduler is arguably in quite a bad shape, has far too
little
documentation, has leaks, maybe race conditions, parts *where the
locking rules are unclear* and is probably only fully understood by
a
small hand full of people. I also argue that this is a *very*
complicated piece of software.

We already went over that and agreed. Not least I agreed the base is
shaky since few years  ago. :)

Btw if things align, I hope you will at some point see a follow up
series from me which makes some significant simplifications and
improvements at the same time.

Cool, good to hear!
(Would be even cooler if simplifications and improvements can be
delivered through separate patch series to be easier to review etc.)

Yes, when I spot something I pull it ahead and/or standalone when it 
makes sense. But it is early days and a big job.

So I might be or appear to be a bit pedantic, but I'm not doing
that to
terrorize you, but because I want this thing to become well
documented,
understandable, and bisectable. Working towards a canonical, idiot-
proof commit style is one measure that will help with that.

I want to offer you the following: I can be more relaxed with
things
universally recognized as trivial (comment changes, struct member
reordering) – but when something like a lock is touched in any way,
we
shall document that in the commit message as canonically as
possible,
so someone who's less experienced and just bisected the commit
immediately understands what has been done (or rather: was supposed
to
be done).

So how would you suggest to expand this commit text so it doesn't
read
too self-repeating?

My issue with this particular commit message is mainly that it doesn't
make it obvious what the patch is supposed to do. So one can make it
quicker and better to review by detailing it a bit more, so the
reviewer then can compare commit message vs. what the code does. It
seems to me for example that the actual optimization is being done in
drm_sched_entity_push_job(), and drm_sched_entity_pop_job() had to be
ported, too, for correctness

"It seems" aka the commit title says so. ;)

Another small thing that might be cool is something that makes it a bit
more obvious that this is an optimization, not a fix.

So I would probably write:

"So far, drm_sched_rq_update_fifo() automatically takes
drm_sched_entity.rq_lock. For DRM_SCHED_POLICY_FIFO, this is
inefficient because that lock is then taken, released and retaken in
drm_sched_entity_push_job().

Improve performance by moving the locking out of
drm_sched_rq_update_fifo()."

Not that much longer but makes it far more obvious what shall be
achieved where :]

How about this:

"""
In FIFO mode (which is the default), both drm_sched_entity_push_job() 
and drm_sched_rq_update_fifo(), where the latter calls the former, are 
currently taking and releasing the same entity->rq_lock.

We can avoid that design inelegance, and also have a miniscule 
efficiency improvement on the idle submit path, by introducing a new 
drm_sched_rq_update_fifo_locked() helper and pulling up the lock taking 
to its callers.
"""

(Let me read through the other patches briefly. Then we should be good
with v2 of this series.. or would it be v3 then?)

Depends how you count. By unique series titles or by fundamental content.

Regards,

Tvrtko