25.06.2020 12:16, Mikko Perttunen пишет: > On 6/25/20 2:11 AM, Dmitry Osipenko wrote: >> 23.06.2020 15:09, Mikko Perttunen пишет: >>> /* Command is an opcode gather from a GEM handle */ >>> #define DRM_TEGRA_SUBMIT_COMMAND_GATHER 0 >>> /* Command is an opcode gather from a user pointer */ >>> #define DRM_TEGRA_SUBMIT_COMMAND_GATHER_UPTR 1 >> >> I'm a bit dubious about whether we really need to retain the non-UPTR >> variant. The memory-copying overhead is negligible because cmdstream's >> data usually is hot in CPU's cache >> >> IIRC, the most (if not all) of the modern DRM drivers drivers use the >> usrptr-only for the cmdstream. >> >> At least there is no any real-world userspace example today that could >> benefit from a non-UPTR variant. >> >> I'm suggesting to leave out the non-UPTR gather variant for now, keeping >> it in mind as a potential future extension of the submission UAPI. Any >> objections? >> > > Sure, we should be able to drop it. Downstream userspace is using it, > but we should be able to fix that. I was thinking that we can directly > map the user pages and point the gather to them without copying - that > way we wouldn't need to make DMA allocations inside the driver for every > submit. We will need to create a Host1x DMA pool and then the dynamic allocations will be cheap. This is an implementation detail that we can discuss separately. We will need the UPTR anyways for the older Tergas because we need to validate the cmdstreams and it's much more efficient to copy from UPTR than from the uncacheable memory. The non-UPTR variant will be fine to add if you'll have a realworld example that demonstrates a noticeable performance difference. Previously, I thought that there will be some perf difference if GR3D shaders are moved into the "insert-opcode" gather, but it was negligible once I implemented it and it should be even more negligible on a modern hardware. > (On early Tegras we could just copy into the pushbuffer but that > won't work for newer ones). Yes, we should copy data into a gather and then push it into channel's pushbuffer. Just like it works with the current upstream driver. Once all the UAPI will be settled, we'll also need to discuss the pushbuffer's implementation because the current driver has some problems with it.