Re: [PATCH v3 2/2] drm/i915: Increase busyspin limit before a context-switch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 26/11/2017 12:20, Chris Wilson wrote:
Looking at the distribution of i915_wait_request for a set of GL
benchmarks, we see:

broadwell# python bcc/tools/funclatency.py -u i915_wait_request
    usecs               : count     distribution
        0 -> 1          : 29184    |****************************************|
        2 -> 3          : 5767     |*******                                 |
        4 -> 7          : 3000     |****                                    |
        8 -> 15         : 491      |                                        |
       16 -> 31         : 140      |                                        |
       32 -> 63         : 203      |                                        |
       64 -> 127        : 543      |                                        |
      128 -> 255        : 881      |*                                       |
      256 -> 511        : 1209     |*                                       |
      512 -> 1023       : 1739     |**                                      |
     1024 -> 2047       : 22855    |*******************************         |
     2048 -> 4095       : 1725     |**                                      |
     4096 -> 8191       : 5813     |*******                                 |
     8192 -> 16383      : 5348     |*******                                 |
    16384 -> 32767      : 1000     |*                                       |
    32768 -> 65535      : 4400     |******                                  |
    65536 -> 131071     : 296      |                                        |
   131072 -> 262143     : 225      |                                        |
   262144 -> 524287     : 4        |                                        |
   524288 -> 1048575    : 1        |                                        |
  1048576 -> 2097151    : 1        |                                        |
  2097152 -> 4194303    : 1        |                                        |

broxton# python bcc/tools/funclatency.py -u i915_wait_request
    usecs               : count     distribution
        0 -> 1          : 5523     |*************************************   |
        2 -> 3          : 1340     |*********                               |
        4 -> 7          : 2100     |**************                          |
        8 -> 15         : 755      |*****                                   |
       16 -> 31         : 211      |*                                       |
       32 -> 63         : 53       |                                        |
       64 -> 127        : 71       |                                        |
      128 -> 255        : 113      |                                        |
      256 -> 511        : 262      |*                                       |
      512 -> 1023       : 358      |**                                      |
     1024 -> 2047       : 1105     |*******                                 |
     2048 -> 4095       : 848      |*****                                   |
     4096 -> 8191       : 1295     |********                                |
     8192 -> 16383      : 5894     |****************************************|
    16384 -> 32767      : 4270     |****************************            |
    32768 -> 65535      : 5622     |**************************************  |
    65536 -> 131071     : 306      |**                                      |
   131072 -> 262143     : 50       |                                        |
   262144 -> 524287     : 76       |                                        |
   524288 -> 1048575    : 34       |                                        |
  1048576 -> 2097151    : 0        |                                        |
  2097152 -> 4194303    : 1        |                                        |

Picking 20us for the context-switch busyspin has the dual advantage of
catching most frequent short waits while avoiding the cost of a context
switch. 20us is a typical latency of 2 context-switches, i.e. the cost
of taking the sleep, without the secondary effects of cache flushing.

Next thing I wanted to ask is cumulative time spent spinning vs test duration, or in other words, CPU usage before and after.

And of course was the benefit on benchmarks results measurable, by how much, and what does the perf per Watt say?

Regards,

Tvrtko

Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Sagar Kamble <sagar.a.kamble@xxxxxxxxx>
Cc: Eero Tamminen <eero.t.tamminen@xxxxxxxxx>
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Cc: Ben Widawsky <ben@xxxxxxxxxxxx>
Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>
Cc: Michał Winiarski <michal.winiarski@xxxxxxxxx>
---
  drivers/gpu/drm/i915/Kconfig.profile | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
index a1aed0e2aad5..c8fe5754466c 100644
--- a/drivers/gpu/drm/i915/Kconfig.profile
+++ b/drivers/gpu/drm/i915/Kconfig.profile
@@ -11,7 +11,7 @@ config DRM_I915_SPIN_REQUEST_IRQ
config DRM_I915_SPIN_REQUEST_CS
  	int
-	default 2 # microseconds
+	default 20 # microseconds
  	help
  	  After sleeping for a request (GPU operation) to complete, we will
  	  be woken up on the completion of every request prior to the one

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux