On 7/25/23 18:43, Danilo Krummrich wrote:
On 7/25/23 18:16, Faith Ekstrand wrote:
Thanks for the detailed write-up! That would definitely explain it. If
I remember, I'll try to do a single-threaded run or two. If your
theory is correct, there should be no real perf difference when
running single-threaded. Those runs will take a long time, though, so
I'll have to run them over night. I'll let you know in a few days once
I have the results.
I can also push a separate branch where I just print out a warning
whenever we run into such a condition including the time we were waiting
for things to complete. I can probably push something later today.
https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next-track-stalls
It prints out the duration of every wait as well as the total wait time
since boot.
- Danilo
If this theory holds, then I'm not concerned about the performance of
the API itself. It would still be good to see if we can find a way to
reduce the cross-process drag in the implementation but that's a perf
optimization we can do later.
From the kernel side I think the only thing we could really do is to
temporarily run a secondary drm_gpu_scheduler instance, one for VM_BINDs
and one for EXECs until we got the new page table handling in place.
However, the UMD could avoid such conditions more effectively, since it
controls the address space. Namely, avoid re-using the same region of
the address space right away in certain cases. For instance, instead of
replacing a sparse region A[0x0, 0x4000000] with a larger sparse region
B[0x0, 0x8000000], replace it with B'[0x4000000, 0xC000000] if possible.
However, just mentioning this for completeness. The UMD surely shouldn't
probably even temporarily work around such a kernel limitation.
Anyway, before doing any of those, let's see if the theory holds and
we're actually running into such cases.
Does it actually matter? Yes, it kinda does. No, it probably doesn't
matter for games because you're typically only running one game at a
time. From a development PoV, however, if it makes CI take longer then
that slows down development and that's not good for the users, either.
Fully agree.
- Danilo
~Faith
- Danilo
>
> ~Faith
>