Re: [PATCH 3/5] drm/i915: Increase busyspin limit before a context-switch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 02/08/2018 15:40, Tvrtko Ursulin wrote:

On 28/07/2018 17:46, Chris Wilson wrote:
Looking at the distribution of i915_wait_request for a set of GL

What was the set?

benchmarks, we see:

broadwell# python bcc/tools/funclatency.py -u i915_wait_request
    usecs               : count     distribution
        0 -> 1          : 29184 |****************************************|         2 -> 3          : 5767 |*******                                 |         4 -> 7          : 3000 |****                                    |         8 -> 15         : 491 |                                        |        16 -> 31         : 140 |                                        |        32 -> 63         : 203 |                                        |        64 -> 127        : 543 |                                        |       128 -> 255        : 881 |*                                       |       256 -> 511        : 1209 |*                                       |       512 -> 1023       : 1739 |**                                      |      1024 -> 2047       : 22855 |*******************************         |      2048 -> 4095       : 1725 |**                                      |      4096 -> 8191       : 5813 |*******                                 |      8192 -> 16383      : 5348 |*******                                 |     16384 -> 32767      : 1000 |*                                       |     32768 -> 65535      : 4400 |******                                  |     65536 -> 131071     : 296 |                                        |    131072 -> 262143     : 225 |                                        |    262144 -> 524287     : 4 |                                        |    524288 -> 1048575    : 1 |                                        |   1048576 -> 2097151    : 1 |                                        |   2097152 -> 4194303    : 1 |                                        |

broxton# python bcc/tools/funclatency.py -u i915_wait_request
    usecs               : count     distribution
        0 -> 1          : 5523 |*************************************   |         2 -> 3          : 1340 |*********                               |         4 -> 7          : 2100 |**************                          |         8 -> 15         : 755 |*****                                   |        16 -> 31         : 211 |*                                       |        32 -> 63         : 53 |                                        |        64 -> 127        : 71 |                                        |       128 -> 255        : 113 |                                        |       256 -> 511        : 262 |*                                       |       512 -> 1023       : 358 |**                                      |      1024 -> 2047       : 1105 |*******                                 |      2048 -> 4095       : 848 |*****                                   |      4096 -> 8191       : 1295 |********                                |      8192 -> 16383      : 5894 |****************************************|     16384 -> 32767      : 4270 |****************************            |     32768 -> 65535      : 5622 |**************************************  |     65536 -> 131071     : 306 |**                                      |    131072 -> 262143     : 50 |                                        |    262144 -> 524287     : 76 |                                        |    524288 -> 1048575    : 34 |                                        |   1048576 -> 2097151    : 0 |                                        |   2097152 -> 4194303    : 1 |                                        |

Picking 20us for the context-switch busyspin has the dual advantage of
catching most frequent short waits while avoiding the cost of a context
switch. 20us is a typical latency of 2 context-switches, i.e. the cost
of taking the sleep, without the secondary effects of cache flushing.

Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Sagar Kamble <sagar.a.kamble@xxxxxxxxx>
Cc: Eero Tamminen <eero.t.tamminen@xxxxxxxxx>
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Cc: Ben Widawsky <ben@xxxxxxxxxxxx>
Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>
Cc: Michał Winiarski <michal.winiarski@xxxxxxxxx>
---
  drivers/gpu/drm/i915/Kconfig.profile | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
index 63cb744d920d..de394dea4a14 100644
--- a/drivers/gpu/drm/i915/Kconfig.profile
+++ b/drivers/gpu/drm/i915/Kconfig.profile
@@ -14,7 +14,7 @@ config DRM_I915_SPIN_REQUEST_IRQ
  config DRM_I915_SPIN_REQUEST_CS
      int
-    default 2 # microseconds
+    default 20 # microseconds
      help
        After sleeping for a request (GPU operation) to complete, we will
        be woken up on the completion of every request prior to the one


I'd be more tempted to pick 10us given the histograms. It would avoid wasting cycles on Broadwell and keep the majority of the benefit on Broxton.

Actually the first spin is 5us so are you sure bumping of the second spin should be the first step? In other words, wouldn't bumping the first one to 10us eliminate most the the low bars from the histogram?

Regards,

Tvrtko

However.. it also raises the question if we perhaps want to have this initialized per-platform at runtime.. ? That would open up the way of auto-tuning it, if the goal is to eliminate the low part of the histogram.

Also, please add to the commit what kind of perf/watt or something effect on the benchmarks we get with it.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux