Re: [PATCH v3 2/7] drm: Add drm_memcpy_from_wc() variant which accepts destination address

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 13.07.22 um 17:47 schrieb Lucas De Marchi:
On Tue, Apr 26, 2022 at 10:21:43PM +0530, Balasubramani Vivekanandan wrote:
Fast copy using non-temporal instructions for x86 currently exists at two
locations. One is implemented in i915 driver at i915/i915_memcpy.c and
another copy at drm_cache.c. The plan is to remove the duplicate
implementation in i915 driver and use the functions from drm_cache.c.

A variant of drm_memcpy_from_wc() is added in drm_cache.c which accepts
address as argument instead of iosys_map for destination. It is a very
common scenario in i915 to copy from a WC memory type, which may be an
io memory or a system memory to a destination address pointing to system
memory. To avoid the overhead of creating iosys_map type for the
destination, new variant is created to accept the address directly.

Well that is pretty much a NAK. iosys_map was created to avoid exactly this kind of usage.

Creating a temporary iosys_map instance on the stack should have basically no overhead at all.


Also a new function is exported in drm_cache.c to find if the fast copy
is supported by the platform or not. It is required for i915.

v2: Added a new argument to drm_memcpy_from_wc_vaddr() which provides
the offset into the src address to start copy from.

Cc: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx>
Cc: Maxime Ripard <mripard@xxxxxxxxxx>
Cc: Thomas Zimmermann <tzimmermann@xxxxxxx>
Cc: David Airlie <airlied@xxxxxxxx>
Cc: Daniel Vetter <daniel@xxxxxxxx>
Cc: Thomas Hellstr_m <thomas.hellstrom@xxxxxxxxxxxxxxx>

           ^ utf-8 error?


Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@xxxxxxxxx>
Reviewed-by: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
Reviewed-by: Nirmoy Das <nirmoy.das@xxxxxxxxx>
---
drivers/gpu/drm/drm_cache.c | 55 +++++++++++++++++++++++++++++++++++++
include/drm/drm_cache.h     |  3 ++
2 files changed, 58 insertions(+)

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index 2e2545df3310..8c7af755f7bc 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -358,6 +358,55 @@ void drm_memcpy_from_wc(struct iosys_map *dst,
}
EXPORT_SYMBOL(drm_memcpy_from_wc);

+/**
+ * drm_memcpy_from_wc_vaddr - Perform the fastest available memcpy from a source
+ * that may be WC to a destination in system memory.
+ * @dst: The destination pointer
+ * @src: The source pointer
+ * @src_offset: The offset from which to copy
+ * @len: The size of the area to transfer in bytes
+ *
+ * Same as drm_memcpy_from_wc except destination is accepted as system memory + * address. Useful in situations where passing destination address as iosys_map
+ * is simply an overhead and can be avoided.
+ */
+void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src,
+                  size_t src_offset, unsigned long len)
+{
+    const void *src_addr = src->is_iomem ?
+                (void const __force *)src->vaddr_iomem :
+                src->vaddr;

checking this again... this is a layer violation. We do have some in
the codebase, but it'd be good not to add more.

Yes, agree completely. That here is something we should try hard to avoid.


later I'd like these memcpy variants to be moved iosys-map. I'm not sure
if Thomas Zimmermann or Christian agree. Cc'ing them here.

Again yes, that sounds like a good idea to me as well.


Maybe we could scan the codebase and add a function in iosys-map like:

    void *iosys_map_get_ptr(struct iosys_map *map, bool is_lmem)

... which is similar to the ttm_kmap_obj_virtual() function we have in
ttm.

so the cases where the layer is violated at least they come from a
single function call in iosys_map API rather than directly accessing
is_iomem/vaddr_iomem/vaddr. Thomas/Christian, any opinion here?

As noted above creating an iosys_map instance on the stack has basically no overhead at all.

So we should really switch those functions over to using the structure directly.

Regards,
Christian.


Lucas De Marchi

+
+    if (WARN_ON(in_interrupt())) {
+        iosys_map_memcpy_from(dst, src, src_offset, len);
+        return;
+    }
+
+    if (static_branch_likely(&has_movntdqa)) {
+        __drm_memcpy_from_wc(dst, src_addr + src_offset, len);
+        return;
+    }
+
+    iosys_map_memcpy_from(dst, src, src_offset, len);
+}
+EXPORT_SYMBOL(drm_memcpy_from_wc_vaddr);
+
+/*
+ * drm_memcpy_fastcopy_supported - Returns if fast copy using non-temporal
+ * instructions is supported
+ *
+ * Returns true if platform has support for fast copying from wc memory type
+ * using non-temporal instructions. Else false.
+ */
+bool drm_memcpy_fastcopy_supported(void)
+{
+    if (static_branch_likely(&has_movntdqa))
+        return true;
+
+    return false;
+}
+EXPORT_SYMBOL(drm_memcpy_fastcopy_supported);
+
/*
 * drm_memcpy_init_early - One time initialization of the WC memcpy code
 */
@@ -382,6 +431,12 @@ void drm_memcpy_from_wc(struct iosys_map *dst,
}
EXPORT_SYMBOL(drm_memcpy_from_wc);

+bool drm_memcpy_fastcopy_supported(void)
+{
+    return false;
+}
+EXPORT_SYMBOL(drm_memcpy_fastcopy_supported);
+
void drm_memcpy_init_early(void)
{
}
diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h
index 22deb216b59c..d1b57c84a659 100644
--- a/include/drm/drm_cache.h
+++ b/include/drm/drm_cache.h
@@ -77,4 +77,7 @@ void drm_memcpy_init_early(void);
void drm_memcpy_from_wc(struct iosys_map *dst,
            const struct iosys_map *src,
            unsigned long len);
+bool drm_memcpy_fastcopy_supported(void);
+void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src,
+                  size_t src_offset, unsigned long len);
#endif
--
2.25.1





[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux