On 26/11/2017 12:20, Chris Wilson wrote:
Looking at the distribution of i915_wait_request for a set of GL
benchmarks, we see:
broadwell# python bcc/tools/funclatency.py -u i915_wait_request
usecs : count distribution
0 -> 1 : 29184 |****************************************|
2 -> 3 : 5767 |******* |
4 -> 7 : 3000 |**** |
8 -> 15 : 491 | |
16 -> 31 : 140 | |
32 -> 63 : 203 | |
64 -> 127 : 543 | |
128 -> 255 : 881 |* |
256 -> 511 : 1209 |* |
512 -> 1023 : 1739 |** |
1024 -> 2047 : 22855 |******************************* |
2048 -> 4095 : 1725 |** |
4096 -> 8191 : 5813 |******* |
8192 -> 16383 : 5348 |******* |
16384 -> 32767 : 1000 |* |
32768 -> 65535 : 4400 |****** |
65536 -> 131071 : 296 | |
131072 -> 262143 : 225 | |
262144 -> 524287 : 4 | |
524288 -> 1048575 : 1 | |
1048576 -> 2097151 : 1 | |
2097152 -> 4194303 : 1 | |
broxton# python bcc/tools/funclatency.py -u i915_wait_request
usecs : count distribution
0 -> 1 : 5523 |************************************* |
2 -> 3 : 1340 |********* |
4 -> 7 : 2100 |************** |
8 -> 15 : 755 |***** |
16 -> 31 : 211 |* |
32 -> 63 : 53 | |
64 -> 127 : 71 | |
128 -> 255 : 113 | |
256 -> 511 : 262 |* |
512 -> 1023 : 358 |** |
1024 -> 2047 : 1105 |******* |
2048 -> 4095 : 848 |***** |
4096 -> 8191 : 1295 |******** |
8192 -> 16383 : 5894 |****************************************|
16384 -> 32767 : 4270 |**************************** |
32768 -> 65535 : 5622 |************************************** |
65536 -> 131071 : 306 |** |
131072 -> 262143 : 50 | |
262144 -> 524287 : 76 | |
524288 -> 1048575 : 34 | |
1048576 -> 2097151 : 0 | |
2097152 -> 4194303 : 1 | |
Picking 20us for the context-switch busyspin has the dual advantage of
catching most frequent short waits while avoiding the cost of a context
switch. 20us is a typical latency of 2 context-switches, i.e. the cost
of taking the sleep, without the secondary effects of cache flushing.
Next thing I wanted to ask is cumulative time spent spinning vs test
duration, or in other words, CPU usage before and after.
And of course was the benefit on benchmarks results measurable, by how
much, and what does the perf per Watt say?
Regards,
Tvrtko
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Sagar Kamble <sagar.a.kamble@xxxxxxxxx>
Cc: Eero Tamminen <eero.t.tamminen@xxxxxxxxx>
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Cc: Ben Widawsky <ben@xxxxxxxxxxxx>
Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>
Cc: Michał Winiarski <michal.winiarski@xxxxxxxxx>
---
drivers/gpu/drm/i915/Kconfig.profile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
index a1aed0e2aad5..c8fe5754466c 100644
--- a/drivers/gpu/drm/i915/Kconfig.profile
+++ b/drivers/gpu/drm/i915/Kconfig.profile
@@ -11,7 +11,7 @@ config DRM_I915_SPIN_REQUEST_IRQ
config DRM_I915_SPIN_REQUEST_CS
int
- default 2 # microseconds
+ default 20 # microseconds
help
After sleeping for a request (GPU operation) to complete, we will
be woken up on the completion of every request prior to the one
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx