On Tue, Apr 26, 2022 at 10:21:43PM +0530, Balasubramani Vivekanandan wrote:
Fast copy using non-temporal instructions for x86 currently exists at two locations. One is implemented in i915 driver at i915/i915_memcpy.c and another copy at drm_cache.c. The plan is to remove the duplicate implementation in i915 driver and use the functions from drm_cache.c. A variant of drm_memcpy_from_wc() is added in drm_cache.c which accepts address as argument instead of iosys_map for destination. It is a very common scenario in i915 to copy from a WC memory type, which may be an io memory or a system memory to a destination address pointing to system memory. To avoid the overhead of creating iosys_map type for the destination, new variant is created to accept the address directly. Also a new function is exported in drm_cache.c to find if the fast copy is supported by the platform or not. It is required for i915. v2: Added a new argument to drm_memcpy_from_wc_vaddr() which provides the offset into the src address to start copy from. Cc: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx> Cc: Maxime Ripard <mripard@xxxxxxxxxx> Cc: Thomas Zimmermann <tzimmermann@xxxxxxx> Cc: David Airlie <airlied@xxxxxxxx> Cc: Daniel Vetter <daniel@xxxxxxxx> Cc: Thomas Hellstr_m <thomas.hellstrom@xxxxxxxxxxxxxxx>
^ utf-8 error?
Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@xxxxxxxxx> Reviewed-by: Lucas De Marchi <lucas.demarchi@xxxxxxxxx> Reviewed-by: Nirmoy Das <nirmoy.das@xxxxxxxxx> --- drivers/gpu/drm/drm_cache.c | 55 +++++++++++++++++++++++++++++++++++++ include/drm/drm_cache.h | 3 ++ 2 files changed, 58 insertions(+) diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index 2e2545df3310..8c7af755f7bc 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -358,6 +358,55 @@ void drm_memcpy_from_wc(struct iosys_map *dst, } EXPORT_SYMBOL(drm_memcpy_from_wc); +/** + * drm_memcpy_from_wc_vaddr - Perform the fastest available memcpy from a source + * that may be WC to a destination in system memory. + * @dst: The destination pointer + * @src: The source pointer + * @src_offset: The offset from which to copy + * @len: The size of the area to transfer in bytes + * + * Same as drm_memcpy_from_wc except destination is accepted as system memory + * address. Useful in situations where passing destination address as iosys_map + * is simply an overhead and can be avoided. + */ +void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src, + size_t src_offset, unsigned long len) +{ + const void *src_addr = src->is_iomem ? + (void const __force *)src->vaddr_iomem : + src->vaddr;
checking this again... this is a layer violation. We do have some in the codebase, but it'd be good not to add more. later I'd like these memcpy variants to be moved iosys-map. I'm not sure if Thomas Zimmermann or Christian agree. Cc'ing them here. Maybe we could scan the codebase and add a function in iosys-map like: void *iosys_map_get_ptr(struct iosys_map *map, bool is_lmem) ... which is similar to the ttm_kmap_obj_virtual() function we have in ttm. so the cases where the layer is violated at least they come from a single function call in iosys_map API rather than directly accessing is_iomem/vaddr_iomem/vaddr. Thomas/Christian, any opinion here? Lucas De Marchi
+ + if (WARN_ON(in_interrupt())) { + iosys_map_memcpy_from(dst, src, src_offset, len); + return; + } + + if (static_branch_likely(&has_movntdqa)) { + __drm_memcpy_from_wc(dst, src_addr + src_offset, len); + return; + } + + iosys_map_memcpy_from(dst, src, src_offset, len); +} +EXPORT_SYMBOL(drm_memcpy_from_wc_vaddr); + +/* + * drm_memcpy_fastcopy_supported - Returns if fast copy using non-temporal + * instructions is supported + * + * Returns true if platform has support for fast copying from wc memory type + * using non-temporal instructions. Else false. + */ +bool drm_memcpy_fastcopy_supported(void) +{ + if (static_branch_likely(&has_movntdqa)) + return true; + + return false; +} +EXPORT_SYMBOL(drm_memcpy_fastcopy_supported); + /* * drm_memcpy_init_early - One time initialization of the WC memcpy code */ @@ -382,6 +431,12 @@ void drm_memcpy_from_wc(struct iosys_map *dst, } EXPORT_SYMBOL(drm_memcpy_from_wc); +bool drm_memcpy_fastcopy_supported(void) +{ + return false; +} +EXPORT_SYMBOL(drm_memcpy_fastcopy_supported); + void drm_memcpy_init_early(void) { } diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h index 22deb216b59c..d1b57c84a659 100644 --- a/include/drm/drm_cache.h +++ b/include/drm/drm_cache.h @@ -77,4 +77,7 @@ void drm_memcpy_init_early(void); void drm_memcpy_from_wc(struct iosys_map *dst, const struct iosys_map *src, unsigned long len); +bool drm_memcpy_fastcopy_supported(void); +void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src, + size_t src_offset, unsigned long len); #endif -- 2.25.1