On Thu, 17 Aug 2023 13:13:31 +0200 Danilo Krummrich <dakr@xxxxxxxxxx> wrote: > I think that's a misunderstanding. I'm not trying to say that it is > *always* beneficial to fill up the ring as much as possible. But I think > it is under certain circumstances, exactly those circumstances I > described for Nouveau. > > As mentioned, in Nouveau the size of a job is only really limited by the > ring size, which means that one job can (but does not necessarily) fill > up the whole ring. We both agree that this is inefficient, because it > potentially results into the HW run dry due to hw_submission_limit == 1. > > I recognize you said that one should define hw_submission_limit and > adjust the other parts of the equation accordingly, the options I see are: > > (1) Increase the ring size while keeping the maximum job size. > (2) Decrease the maximum job size while keeping the ring size. > (3) Let the scheduler track the actual job size rather than the maximum > job size. > > (1) results into potentially wasted ring memory, because we're not > always reaching the maximum job size, but the scheduler assumes so. > > (2) results into more IOCTLs from userspace for the same amount of IBs > and more jobs result into more memory allocations and more work being > submitted to the workqueue (with Matt's patches). > > (3) doesn't seem to have any of those draw backs. > > What would be your take on that? > > Actually, if none of the other drivers is interested into a more precise > way of keeping track of the ring utilization, I'd be totally fine to do > it in a driver specific way. However, unfortunately I don't see how this > would be possible. I'm not entirely sure, but I think PowerVR is pretty close to your description: jobs size is dynamic size, and the ring buffer size is picked by the driver at queue initialization time. What we did was to set hw_submission_limit to an arbitrarily high value of 64k (we could have used something like ringbuf_size/min_job_size instead), and then have the control flow implemented with ->prepare_job() [1] (CCCB is the PowerVR ring buffer). This allows us to maximize ring buffer utilization while still allowing dynamic-size jobs. > > My proposal would be to just keep the hw_submission_limit (maybe rename > it to submission_unit_limit) and add a submission_units field to struct > drm_sched_job. By default a jobs submission_units field would be 0 and > the scheduler would behave the exact same way as it does now. > > Accordingly, jobs with submission_units > 1 would contribute more than > one unit to the submission_unit_limit. > > What do you think about that? > > Besides all that, you said that filling up the ring just enough to not > let the HW run dry rather than filling it up entirely is desirable. Why > do you think so? I tend to think that in most cases it shouldn't make > difference. [1]https://gitlab.freedesktop.org/frankbinns/powervr/-/blob/powervr-next/drivers/gpu/drm/imagination/pvr_queue.c#L502