Re: [Intel-gfx] [PATCH 1/2] iosys-map: Add per-word read

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 17, 2022 at 01:52:03AM -0700, Lucas De Marchi wrote:
Instead of always falling back to memcpy_fromio() for any size, prefer
using read{b,w,l}(). When reading struct members it's common to read
individual integer variables individually. Going through memcpy_fromio()
for each of them poses a high penalty.

Employ a similar trick as __seqprop() by using _Generic() to generate
only the specific call based on a type-compatible variable.

For a pariticular i915 workload producing GPU context switches,
__get_engine_usage_record() is particularly hot since the engine usage
is read from device local memory with dgfx, possibly multiple times
since it's racy. Test execution time for this test shows a ~12.5%
improvement with DG2:

Before:
	nrepeats = 1000; min = 7.63243e+06; max = 1.01817e+07;
	median = 9.52548e+06; var = 526149;
After:
	nrepeats = 1000; min = 7.03402e+06; max = 8.8832e+06;
	median = 8.33955e+06; var = 333113;

Other things attempted that didn't prove very useful:
1) Change the _Generic() on x86 to just dereference the memory address
2) Change __get_engine_usage_record() to do just 1 read per loop,
  comparing with the previous value read
3) Change __get_engine_usage_record() to access the fields directly as it
  was before the conversion to iosys-map

(3) did gave a small improvement (~3%), but doesn't seem to scale well
to other similar cases in the driver.

Additional test by Chris Wilson using gem_create from igt with some
changes to track object creation time. This happens to accidentally
stress this code path:

	Pre iosys_map conversion of engine busyness:
	lmem0: Creating    262144 4KiB objects took 59274.2ms

	Unpatched:
	lmem0: Creating    262144 4KiB objects took 108830.2ms

	With readl (this patch):
	lmem0: Creating    262144 4KiB objects took 61348.6ms

	s/readl/READ_ONCE/
	lmem0: Creating    262144 4KiB objects took 61333.2ms

So we do take a little bit more time than before the conversion, but
that is due to other factors: bringing the READ_ONCE back would be as
good as just doing this conversion.

v2:
- Remove default from _Generic() - callers wanting to read more
 than u64 should use iosys_map_memcpy_from()
- Add READ_ONCE() cases dereferencing the pointer when using system
 memory

Signed-off-by: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
Reviewed-by: Christian König <christian.koenig@xxxxxxx> # v1
---
include/linux/iosys-map.h | 45 +++++++++++++++++++++++++++++++--------
1 file changed, 36 insertions(+), 9 deletions(-)

diff --git a/include/linux/iosys-map.h b/include/linux/iosys-map.h
index 4b8406ee8bc4..f59dd00ed202 100644
--- a/include/linux/iosys-map.h
+++ b/include/linux/iosys-map.h
@@ -6,6 +6,7 @@
#ifndef __IOSYS_MAP_H__
#define __IOSYS_MAP_H__

+#include <linux/compiler_types.h>
#include <linux/io.h>
#include <linux/string.h>

@@ -333,6 +334,26 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset,
		memset(dst->vaddr + offset, value, len);
}

+#ifdef CONFIG_64BIT
+#define __iosys_map_rd_io_u64_case(val_, vaddr_iomem_)				\
+	u64: val_ = readq(vaddr_iomem_)
+#else
+#define __iosys_map_rd_io_u64_case(val_, vaddr_iomem_)				\
+	u64: memcpy_fromio(&(val_), vaddr_iomem__, sizeof(u64))

I tested io/sys and forgot again to test it for 32-bit :(. This
should fix the build for 32-bits:

diff --git a/include/linux/iosys-map.h b/include/linux/iosys-map.h
index 580e14cd360c..f8bc052f8975 100644
--- a/include/linux/iosys-map.h
+++ b/include/linux/iosys-map.h
@@ -341,7 +341,7 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset,
 	u64: writeq(val_, vaddr_iomem_)
 #else
 #define __iosys_map_rd_io_u64_case(val_, vaddr_iomem_)				\
-	u64: memcpy_fromio(&(val_), vaddr_iomem__, sizeof(u64))
+	u64: memcpy_fromio(&(val_), vaddr_iomem_, sizeof(u64))
 #define __iosys_map_wr_io_u64_case(val_, vaddr_iomem_)			\
 	u64: memcpy_toio(vaddr_iomem_, &(val_), sizeof(u64))
 #endif

Lucas De Marchi



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux