Comment # 14
on bug 96897
from Dieter Nützel
(In reply to Jan Vesely from comment #13) > Initial support for cl_khr_fp16 builtins has been added to libclc in r332677. > It should be enough to run clpeak. > clpeak still takes few mins to compile the kernels (~7mins on my carrizo > laptop) GREAT work Jan! After 3 min and ~12 sec float start crunching on my X3470 Xeon (only one core would be used for kernel compile => 3.6 GHz turbo mode) My desktop was frozen during float 'Global memory bandwidth (GBPS)' compute and partly frozen during 'Double-precision compute (GFLOPS)'. Whole benchmark finished after 6 min and 17 secs. /home/dieter> time clpeak Platform: Clover Device: Radeon RX 580 Series (POLARIS10, DRM 3.23.0, 4.16.9-1.g4f45b1e-default, LLVM 7.0.0) Driver version : 18.2.0-devel (Linux x64) Compute units : 36 Clock frequency : 1411 MHz Global memory bandwidth (GBPS) float : 2.64 float2 : 2.64 float4 : 2.64 float8 : 2.54 float16 : 1.45 Single-precision compute (GFLOPS) float : 6341.87 float2 : 6131.34 float4 : 6105.61 float8 : 5933.91 float16 : 5939.44 half-precision compute (GFLOPS) half : 6307.47 half2 : 6193.25 half4 : 6114.34 half8 : 5729.57 half16 : 6047.90 Double-precision compute (GFLOPS) double : 404.52 double2 : 404.41 double4 : 404.06 double8 : 403.08 double16 : 401.53 Integer compute (GIOPS) int : 1222.75 int2 : 1213.90 int4 : 1210.72 int8 : 1208.57 int16 : 1213.99 Transfer bandwidth (GBPS) enqueueWriteBuffer : 8.78 enqueueReadBuffer : 4.86 enqueueMapBuffer(for read) : 4871.79 memcpy from mapped ptr : 4.94 enqueueUnmap(after write) : 3528.56 memcpy to mapped ptr : 4.94 Kernel launch latency : 293.57 us 206.285u 3.765s 6:17.14 55.6% 0+0k 0+0io 0pf+0w For reference AMD 17.40 /home/dieter> time clpeak Platform: AMD Accelerated Parallel Processing Device: Ellesmere Driver version : 2482.3 (Linux x64) Compute units : 36 Clock frequency : 1411 MHz Global memory bandwidth (GBPS) float : 202.59 float2 : 209.30 float4 : 209.63 float8 : 162.15 float16 : 138.41 Single-precision compute (GFLOPS) float : 6342.71 float2 : 6374.96 float4 : 6178.29 float8 : 5973.53 float16 : 6018.79 half-precision compute (GFLOPS) half : 6306.97 half2 : 6366.06 half4 : 6350.41 half8 : 6154.31 half16 : 6280.47 Double-precision compute (GFLOPS) double : 404.64 double2 : 404.38 double4 : 398.54 double8 : 403.25 double16 : 401.53 Integer compute (GIOPS) int : 1206.77 int2 : 1221.26 int4 : 1225.83 int8 : 1225.88 int16 : 1227.35 Transfer bandwidth (GBPS) enqueueWriteBuffer : 9.03 enqueueReadBuffer : 5.08 enqueueMapBuffer(for read) : 149130.81 memcpy from mapped ptr : 5.09 enqueueUnmap(after write) : 75882.81 memcpy to mapped ptr : 5.08 Kernel launch latency : 93.33 us 23.056u 1.592s 1:08.29 36.0% 0+0k 0+0io 0pf+0w
You are receiving this mail because:
- You are the assignee for the bug.
_______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel