Am 21.08.23 um 21:07 schrieb Danilo Krummrich:
On 8/21/23 20:12, Christian König wrote:
Am 21.08.23 um 20:01 schrieb Danilo Krummrich:
On 8/21/23 16:07, Christian König wrote:
Am 18.08.23 um 13:58 schrieb Danilo Krummrich:
[SNIP]
I only see two possible outcomes:
1. You return -EBUSY (or similar) error code indicating the the
hw can't receive more commands.
2. Wait on previously pushed commands to be executed.
(3. Your driver crash because you accidentally overwrite stuff in
the ring buffer which is still executed. I just assume that's
prevented).
Resolution #1 with -EBUSY is actually something the UAPI should
not do, because your UAPI then depends on the specific timing of
submissions which is a really bad idea.
Resolution #2 is usually bad because it forces the hw to run dry
between submission and so degrade performance.
I agree, that is a good reason for at least limiting the maximum
job size to half of the ring size.
However, there could still be cases where two subsequent jobs are
submitted with just a single IB, which as is would still block
subsequent jobs to be pushed to the ring although there is still
plenty of space. Depending on the (CPU) scheduler latency, such a
case can let the HW run dry as well.
Yeah, that was intentionally not done as well. The crux here is
that the more you push to the hw the worse the scheduling
granularity becomes. It's just that neither Xe nor Nouveau relies
that much on the scheduling granularity at all (because of hw queues).
But Xe doesn't seem to need that feature and I would still try to
avoid it because the more you have pushed to the hw the harder it
is to get going again after a reset.
Surely, we could just continue decrease the maximum job size even
further, but this would result in further overhead on user and
kernel for larger IB counts. Tracking the actual job size seems to
be the better solution for drivers where the job size can vary
over a rather huge range.
I strongly disagree on that. A larger ring buffer is trivial to
allocate
That sounds like a workaround to me. The problem, in the case above,
isn't that the ring buffer does not have enough space, the problem
is that we account for the maximum job size although the actual job
size is much smaller. And enabling the scheduler to track the actual
job size is trivial as well.
That's what I agree on, so far I just didn't see the reason for doing
it but at least a few reason for not doing it.
and if userspace submissions are so small that the scheduler can't
keep up submitting them then your ring buffer size is your smallest
problem.
In other words the submission overhead will completely kill your
performance and you should probably consider stuffing more into a
single submission.
I fully agree and that is also the reason why I want to keep the
maximum job size as large as possible.
However, afaik with Vulkan it's the applications themselves deciding
when and with how many command buffers a queue is submitted (@Faith:
please correct me if I'm wrong). Hence, why not optimize for this
case as well? It's not that it would make another case worse, right?
As I said it does make both the scheduling granularity as well as the
reset behavior worse.
As you already mentioned Nouveau (and XE) don't really rely much on
scheduling granularity. For Nouveau, the same is true for the reset
behavior; if things go south the channel is killed anyway. Userspace
would just request a new ring in this case.
Hence, I think Nouveau would profit from accounting the actual job
size. And at the same time, other drivers having a benefit of always
accounting for the maximum job size would still do so, by default.
Arbitrary ratios of how much the job size contributes to the ring
being considered as full would also be possible.
That would indeed be rather interesting since for a bunch of drivers the
limiting part is not the ring buffer size, but rather the utilization of
engines.
But no idea how to properly design that. You would have multiple values
to check instead of just one.
Christian.
- Danilo
In general I think we should try to push just enough work to the
hardware to keep it busy and not as much as possible.
So as long as nobody from userspace comes and says we absolutely need
to optimize this use case I would rather not do it.
Regards,
Christian.
- Danilo
Regards,
Christian.
- Danilo