[Bug 96897] clpeak OpenCL benchmark hangs during compilation on Clover RadeonSI

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Comment # 14 on bug 96897 from
(In reply to Jan Vesely from comment #13)
> Initial support for cl_khr_fp16 builtins has been added to libclc in r332677.
> It should be enough to run clpeak.
> clpeak still takes few mins to compile the kernels (~7mins on my carrizo
> laptop)

GREAT work Jan!

After 3 min and ~12 sec float start crunching on my X3470 Xeon
(only one core would be used for kernel compile => 3.6 GHz turbo mode)

My desktop was frozen during float 'Global memory bandwidth (GBPS)' compute
and partly frozen during 'Double-precision compute (GFLOPS)'.

Whole benchmark finished after 6 min and 17 secs.

/home/dieter> time clpeak

Platform: Clover
  Device: Radeon RX 580 Series (POLARIS10, DRM 3.23.0,
4.16.9-1.g4f45b1e-default, LLVM 7.0.0)
    Driver version  : 18.2.0-devel (Linux x64)
    Compute units   : 36
    Clock frequency : 1411 MHz

    Global memory bandwidth (GBPS)
      float   : 2.64
      float2  : 2.64
      float4  : 2.64
      float8  : 2.54
      float16 : 1.45

    Single-precision compute (GFLOPS)
      float   : 6341.87
      float2  : 6131.34
      float4  : 6105.61
      float8  : 5933.91
      float16 : 5939.44

    half-precision compute (GFLOPS)
      half   : 6307.47
      half2  : 6193.25
      half4  : 6114.34
      half8  : 5729.57
      half16 : 6047.90

    Double-precision compute (GFLOPS)
      double   : 404.52
      double2  : 404.41
      double4  : 404.06
      double8  : 403.08
      double16 : 401.53

    Integer compute (GIOPS)
      int   : 1222.75
      int2  : 1213.90
      int4  : 1210.72
      int8  : 1208.57
      int16 : 1213.99

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 8.78
      enqueueReadBuffer          : 4.86
      enqueueMapBuffer(for read) : 4871.79
        memcpy from mapped ptr   : 4.94
      enqueueUnmap(after write)  : 3528.56
        memcpy to mapped ptr     : 4.94

    Kernel launch latency : 293.57 us

206.285u 3.765s 6:17.14 55.6%   0+0k 0+0io 0pf+0w


For reference AMD 17.40
/home/dieter> time clpeak

Platform: AMD Accelerated Parallel Processing
  Device: Ellesmere
    Driver version  : 2482.3 (Linux x64)
    Compute units   : 36
    Clock frequency : 1411 MHz

    Global memory bandwidth (GBPS)
      float   : 202.59
      float2  : 209.30
      float4  : 209.63
      float8  : 162.15
      float16 : 138.41

    Single-precision compute (GFLOPS)
      float   : 6342.71
      float2  : 6374.96
      float4  : 6178.29
      float8  : 5973.53
      float16 : 6018.79

    half-precision compute (GFLOPS)
      half   : 6306.97
      half2  : 6366.06
      half4  : 6350.41
      half8  : 6154.31
      half16 : 6280.47

    Double-precision compute (GFLOPS)
      double   : 404.64
      double2  : 404.38
      double4  : 398.54
      double8  : 403.25
      double16 : 401.53

    Integer compute (GIOPS)
      int   : 1206.77
      int2  : 1221.26
      int4  : 1225.83
      int8  : 1225.88
      int16 : 1227.35

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 9.03
      enqueueReadBuffer          : 5.08
      enqueueMapBuffer(for read) : 149130.81
        memcpy from mapped ptr   : 5.09
      enqueueUnmap(after write)  : 75882.81
        memcpy to mapped ptr     : 5.08

    Kernel launch latency : 93.33 us

23.056u 1.592s 1:08.29 36.0%    0+0k 0+0io 0pf+0w


You are receiving this mail because:
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux