On 6/26/20 2:24 AM, Dmitry Osipenko wrote:
25.06.2020 12:16, Mikko Perttunen пишет:
On 6/25/20 2:11 AM, Dmitry Osipenko wrote:
23.06.2020 15:09, Mikko Perttunen пишет:
/* Command is an opcode gather from a GEM handle */
#define DRM_TEGRA_SUBMIT_COMMAND_GATHER 0
/* Command is an opcode gather from a user pointer */
#define DRM_TEGRA_SUBMIT_COMMAND_GATHER_UPTR 1
I'm a bit dubious about whether we really need to retain the non-UPTR
variant. The memory-copying overhead is negligible because cmdstream's
data usually is hot in CPU's cache
IIRC, the most (if not all) of the modern DRM drivers drivers use the
usrptr-only for the cmdstream.
At least there is no any real-world userspace example today that could
benefit from a non-UPTR variant.
I'm suggesting to leave out the non-UPTR gather variant for now, keeping
it in mind as a potential future extension of the submission UAPI. Any
objections?
Sure, we should be able to drop it. Downstream userspace is using it,
but we should be able to fix that. I was thinking that we can directly
map the user pages and point the gather to them without copying - that
way we wouldn't need to make DMA allocations inside the driver for every
submit.
We will need to create a Host1x DMA pool and then the dynamic
allocations will be cheap. This is an implementation detail that we can
discuss separately.
We will need the UPTR anyways for the older Tergas because we need to
validate the cmdstreams and it's much more efficient to copy from UPTR
than from the uncacheable memory.
The non-UPTR variant will be fine to add if you'll have a realworld
example that demonstrates a noticeable performance difference.
Previously, I thought that there will be some perf difference if GR3D
shaders are moved into the "insert-opcode" gather, but it was negligible
once I implemented it and it should be even more negligible on a modern
hardware.
(On early Tegras we could just copy into the pushbuffer but that
won't work for newer ones).
Yes, we should copy data into a gather and then push it into channel's
pushbuffer. Just like it works with the current upstream driver.
Once all the UAPI will be settled, we'll also need to discuss the
pushbuffer's implementation because the current driver has some problems
with it.
True, for earlier Tegras we'll need to copy anyway. So let's just
implement copying for now, while making sure that extending to directly
mapping pages will be possible later (don't know why it wouldn't be),
and implement direct mapping or GEM gathers later if needed.
Mikko