Am 18.05.21 um 15:24 schrieb Thomas Hellström:
On 5/18/21 3:08 PM, Christian König wrote:
Am 18.05.21 um 14:52 schrieb Thomas Hellström:
On 5/18/21 2:09 PM, Christian König wrote:
Am 18.05.21 um 14:04 schrieb Thomas Hellström:
On 5/18/21 1:55 PM, Christian König wrote:
Am 18.05.21 um 10:26 schrieb Thomas Hellström:
The internal ttm_bo_util memcpy uses vmap functionality, and
while it
probably might be possible to use it for copying in- and out of
sglist represented io memory, using io_mem_reserve() /
io_mem_free()
callbacks, that would cause problems with fault().
Instead, implement a method mapping page-by-page using kmap_local()
semantics. As an additional benefit we then avoid the occasional
global
TLB flushes of vmap() and consuming vmap space, elimination of a
critical
point of failure and with a slight change of semantics we could
also push
the memcpy out async for testing and async driver develpment
purposes.
Pushing out async can be done since there is no memory
allocation going on
that could violate the dma_fence lockdep rules.
For copies from iomem, use the WC prefetching memcpy variant for
additional speed.
Note that drivers that don't want to use struct io_mapping but
relies on
memremap functionality, and that don't want to use scatterlists for
VRAM may well define specialized (hopefully reusable) iterators
for their
particular environment.
In general yes please since I have that as TODO for TTM for a
very long time.
But I would prefer to fix the implementation in TTM instead and
give it proper cursor handling.
Amdgpu is also using page based iomem and we are having similar
workarounds in place there as well.
I think it makes sense to unify this inside TTM and remove the
old memcpy util function when done.
Regards,
Christian.
Christian,
I was thinking when we replace the bo.mem with a pointer (and
perhaps have a driver callback to allocate the bo->mem,
we could perhaps embed a struct ttm_kmap_iter and use it for all
mapping in one way or another). That would mean perhaps land this
is i915 now and sort out the unification once the struct
ttm_resource, struct ttm_buffer_object separation has landed?
That stuff is ready, reviewed and I'm just waiting for some amdgpu
changes to land in drm-misc-next to push it.
But yes in general an iterator for the resource object sounds like
the right plan to me as well.
Christian.
OK, so then are you OK with landing this in i915 for now? That would
also ofc mean the export you NAK'd but strictly for this memcpy use
until we merge it with TTM?
Well you can of course prototype that in i915, but I really don't
want to export the TT functions upstream.
I understand, I once had the same thoughts trying to avoid that as far
as possible, so this function was actually then added to the ttm_bo
interface, (hence the awkward naming) as a helper for drivers
implementing move(), essentially a very special case of
ttm_bo_move_accel_cleanup(), but anyway, see below:
Can we cleanly move that functionality into TTM instead?
I'll take a look at that, but I think we'd initially be having
iterators mimicing the current move_memcpy() for the
linear iomem !WC cases, hope that's OK.
Yeah, that's peefectly fine with me. I can tackle cleaning up all
drivers and move over to the new implementation when that is fully complete.
As I said we already have the same problem in amdgpu and only solved it
by avoiding memcpy all together.
Christian.
/Thomas
Christian.
/Thomas
/Thomas