On 5/31/21 2:36 PM, Christian König wrote:
Am 31.05.21 um 14:19 schrieb Thomas Hellström:
The internal ttm_bo_util memcpy uses ioremap functionality, and while it
probably might be possible to use it for copying in- and out of
sglist represented io memory, using io_mem_reserve() / io_mem_free()
callbacks, that would cause problems with fault().
Instead, implement a method mapping page-by-page using kmap_local()
semantics. As an additional benefit we then avoid the occasional global
TLB flushes of ioremap() and consuming ioremap space, elimination of a
critical point of failure and with a slight change of semantics we could
also push the memcpy out async for testing and async driver development
purposes.
A special linear iomem iterator is introduced internally to mimic the
old ioremap behaviour for code-paths that can't immediately be ported
over. This adds to the code size and should be considered a temporary
solution.
Looking at the code we have a lot of checks for iomap tagged pointers.
Ideally we should extend the core memremap functions to also accept
uncached memory and kmap_local functionality. Then we could strip a
lot of code.
Cc: Christian König <christian.koenig@xxxxxxx>
Signed-off-by: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx>
---
v3:
- Split up in various TTM files and addressed review comments by
Christian König. Tested and fixed legacy iomap memcpy path on i915.
v4:
- Fix an uninitialized variable
Reported by: kernel test robot <lkp@xxxxxxxxx>
Reported by: Dan Carpenter <dan.carpenter@xxxxxxxxxx>
- Minor change to the ttm_move_memcpy() interface.
- Gracefully handle lack of memremap() support on memcpy
(Reported by Matthew Auld)
- Minor style fix (Reported by Matthew Auld)
---
drivers/gpu/drm/ttm/ttm_bo_util.c | 280 ++++++++++-------------------
drivers/gpu/drm/ttm/ttm_module.c | 35 ++++
drivers/gpu/drm/ttm/ttm_resource.c | 193 ++++++++++++++++++++
drivers/gpu/drm/ttm/ttm_tt.c | 42 +++++
include/drm/ttm/ttm_bo_driver.h | 28 +++
include/drm/ttm/ttm_caching.h | 2 +
include/drm/ttm/ttm_kmap_iter.h | 61 +++++++
include/drm/ttm/ttm_resource.h | 61 +++++++
include/drm/ttm/ttm_tt.h | 16 ++
9 files changed, 536 insertions(+), 182 deletions(-)
create mode 100644 include/drm/ttm/ttm_kmap_iter.h
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c
b/drivers/gpu/drm/ttm/ttm_bo_util.c
index ae8b61460724..6ac7744a1a5c 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -72,190 +72,126 @@ void ttm_mem_io_free(struct ttm_device *bdev,
mem->bus.addr = NULL;
}
-static int ttm_resource_ioremap(struct ttm_device *bdev,
- struct ttm_resource *mem,
- void **virtual)
+/**
+ * ttm_move_memcpy - Helper to perform a memcpy ttm move operation.
+ * @bo: The struct ttm_buffer_object.
+ * @new_mem: The struct ttm_resource we're moving to (copy
destination).
+ * @new_iter: A struct ttm_kmap_iter representing the destination
resource.
+ * @src_iter: A struct ttm_kmap_iter representing the source resource.
+ *
+ * This function is intended to be able to move out async under a
+ * dma-fence if desired.
+ */
+void ttm_move_memcpy(struct ttm_buffer_object *bo,
+ pgoff_t num_pages,
Can we switch to uint32_t for num_pages for TTM in general?
That allows to copy 16TiB when you have 4KiB pages which should be
enough for quite a while and I had some really bad bugs because people
tend to do << PAGE_SHIFT and forget that it is only 32bit sometimes.
I can do that, although IIRC we've had some discussions internally that
16TiB isn't enough for our bos in general, so at some point a request
from us might to be to see what we can do to bump that across TTM for
64-bit?
Matthew, you looked at this a couple of weeks ago?
Apart from that feel free to stick my rb on the patch.
Thanks!
/Thomas
Christian.