Re: [PATCH drm-misc-next v3] drm/sched: implement dynamic job-flow control

Christian König <christian.koenig@xxxxxxx> · Tue, 31 Oct 2023 14:20:50 +0100

    Hi Danilo,

    sorry for splitting up the mail thread. I had to fetch this mail
    from my spam folder for some reason.

    Am 30.10.23 um 18:56 schrieb Danilo
      Krummrich:

      Hi Christian,

[SNIP]

          And yes, we can live with the overhead of making jobs
slightly bigger than they actually are, thus potentially delaying their
execution. It's just that I don't quite understand the rational behind
this conservatism, as I don't really see what negative impact this extra
->update_job_credits() call in the credit checking path has, other than
the slight overhead of an if-check for drivers that don't need it.

        >From experience it showed that we should not make the scheduler more
complicated than necessary. And I still think that the ring buffers only
need to be filled enough to keep the hardware busy.

      Right, and this callback contributes to exactly that.

I don't really think there is much to worry about in terms of introducing more
complexity. The implementation behind this callback is fairly trivial - it's
simply called right before we check whether a job fits on the ring, to fetch
the job's actual size.

I would agree if the implementation of that would be bulky, tricky and asking
for a compromise. But, since it actually is simple and straight forward I really
think that if we implement some kind of dynamic job-flow control it should be
implemented as acurate as possible rather than doing it half-baked.

    Yeah, I see the intention here and can perfectly relate to it it's
    just that I have prioritize other things.

    Adding this callback allows for the driver to influence the job
    submission and while this might seems useful now I'm just to much of
    a burned child to do stuff like this without having a really good
    reason for it.

        If this here has some measurable positive effect then yeah we should
probably do it, but as long as it's only nice to have I have some objections
to that.

      Can't answer this, since Nouveau doesn't support native fence waits. However, I
guess it depends on how many native fences a job carries. So, if we'd have two
jobs with each of them carrying a lot of native fences, but not a lot of actual
work, I can very well imagine that over-accounting can have a measureable
impact.

    What I can imagine as well is things like the hardware or firmware
    is looking at the fullness of the ring buffer to predict how much
    pending work there is.

      I just wonder if we really want to ask for real measurements given that the
optimization is rather trivial and cheap and we already have enough evidence
that it makes sense conceptually.

    Well that's the point I disagree on, this callback isn't trivial. If
    (for example) the driver returns a value larger than initially
    estimated for the job we can run into an endless loop.

    It's just one more thing which can go boom. At bare minimum we
    should check that the value is always decreasing.

    Christian.

- Danilo

        Regards,
Christian.

          Regards,

Boris