Re: Slow memory access when using OpenCL without X11

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



For reproduction only the tiny cl_slow_test.cpp is needed which is attached to first e-mail.

System information is following:
CPU: Ryzen5 2400G
Main board: Gigabyte AMD B450 AORUS mini itx: https://www.gigabyte.com/Motherboard/B450-I-AORUS-PRO-WIFI-rev-10#kf
BIOS: F5 8.47 MB 2019/01/25 (latest)
OS: Ubuntu 18.04 LTS
rocm-opencl-dev installation:
echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main' | sudo tee /etc/apt/sources.list.d/rocm.list
sudo apt install rocm-opencl-dev

Also exactly the same issue happens with this board: https://www.gigabyte.com/Motherboard/GA-AB350-Gaming-3-rev-1x#kf
 
I have MSI and Asrock mini itx boards ready as well, So far didn't get amdgpu & opencl working there but I'll try again tomorrow..

--
Lauri


On Wed, Mar 13, 2019 at 8:51 PM Kuehling, Felix <Felix.Kuehling@xxxxxxx> wrote:
Hi Lauri,

I still think the SMU is doing something funny, but rocm-smi isn't
showing enough information to really see what's going on.

On APUs the SMU firmware is embedded in the system BIOS. Unlike discrete
GPUs, the SMU firmware is not loaded by the driver. You could try
updating your system BIOS to the latest version available from your main
board vendor and see if that makes a difference. It may include a newer
version of the SMU firmware, potentially with a fix.

If that doesn't help, we'd have to reproduce the problem in house to see
what's happening, which may require the same main board and BIOS version
you're using. We can ask our SMU firmware team if they've ever
encountered your type of problem. But I don't want to give you too much
hope. It's a tricky problem involving HW, firmware and multiple driver
components in a fairly unusual configuration.

Regards,
   Felix

On 2019-03-13 7:28 a.m., Lauri Ehrenpreis wrote:
> What I observe is that moving the mouse made the memory speed go up
> and also it made mclk=1200Mhz in rocm-smi output.
> However if I force mclk to 1200Mhz myself then memory speed is still
> slow.
>
> So rocm-smi output when memory speed went fast due to mouse movement:
> rocm-smi
> ========================        ROCm System Management Interface
> ========================
> ================================================================================================
> GPU   Temp   AvgPwr   SCLK    MCLK    PCLK      Fan     Perf   
> PwrCap   SCLK OD   MCLK OD GPU%
> GPU[0] : WARNING: Empty SysFS value: pclk
> GPU[0] : WARNING: Unable to read
> /sys/class/drm/card0/device/gpu_busy_percent
> 0     44.0c  N/A      400Mhz  1200Mhz N/A       0%      manual  N/A   
>   0%        0%  N/A
> ================================================================================================
> ========================               End of ROCm SMI Log           
>   ========================
>
> And rocm-smi output when I forced memclk=1200MHz myself:
> rocm-smi --setmclk 2
> rocm-smi
> ========================        ROCm System Management Interface
> ========================
> ================================================================================================
> GPU   Temp   AvgPwr   SCLK    MCLK    PCLK      Fan     Perf   
> PwrCap   SCLK OD   MCLK OD GPU%
> GPU[0] : WARNING: Empty SysFS value: pclk
> GPU[0] : WARNING: Unable to read
> /sys/class/drm/card0/device/gpu_busy_percent
> 0     39.0c  N/A      400Mhz  1200Mhz N/A       0%      manual  N/A   
>   0%        0%  N/A
> ================================================================================================
> ========================               End of ROCm SMI Log           
>   ========================
>
> So only difference is that temperature shows 44c when memory speed was
> fast and 39c when it was slow. But mclk was 1200MHz and sclk was
> 400MHz in both cases.
> Can it be that rocm-smi just has a bug in reporting and mclk was not
> actually 1200MHz when I forced it with rocm-smi --setmclk 2 ?
> That would explain the different behaviour..
>
> If so then is there a programmatic way how to really guarantee the
> high speed mclk? Basically I want do something similar in my program
> what happens if I move
> the mouse in desktop env and this way guarantee the normal memory
> speed each time the program starts.
>
> --
> Lauri
>
>
> On Tue, Mar 12, 2019 at 11:36 PM Deucher, Alexander
> <Alexander.Deucher@xxxxxxx <mailto:Alexander.Deucher@xxxxxxx>> wrote:
>
>     Forcing the sclk and mclk high may impact the CPU frequency since
>     they share TDP.
>
>     Alex
>     ------------------------------------------------------------------------
>     *From:* amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx
>     <mailto:amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx>> on behalf of Lauri
>     Ehrenpreis <laurioma@xxxxxxxxx <mailto:laurioma@xxxxxxxxx>>
>     *Sent:* Tuesday, March 12, 2019 5:31 PM
>     *To:* Kuehling, Felix
>     *Cc:* Tom St Denis; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
>     <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
>     *Subject:* Re: Slow memory access when using OpenCL without X11
>     However it's not only related to mclk and sclk. I tried this:
>     rocm-smi  --setsclk 2
>     rocm-smi  --setmclk 3
>     rocm-smi
>     ========================        ROCm System Management Interface
>     ========================
>     ================================================================================================
>     GPU   Temp   AvgPwr   SCLK    MCLK    PCLK          Fan     Perf 
>       PwrCap   SCLK OD  MCLK OD  GPU%
>     GPU[0] : WARNING: Empty SysFS value: pclk
>     GPU[0] : WARNING: Unable to read
>     /sys/class/drm/card0/device/gpu_busy_percent
>     0     34.0c  N/A      1240Mhz 1333Mhz N/A           0%     
>     manual  N/A      0% 0%       N/A
>     ================================================================================================
>     ========================               End of ROCm SMI Log
>     ========================
>
>     ./cl_slow_test 1
>     got 1 platforms 1 devices
>     speed 3919.777100 avg 3919.777100 mbytes/s
>     speed 3809.373291 avg 3864.575195 mbytes/s
>     speed 585.796814 avg 2771.649170 mbytes/s
>     speed 188.721848 avg 2125.917236 mbytes/s
>     speed 188.916367 avg 1738.517090 mbytes/s
>
>     So despite forcing max sclk and mclk the memory speed is still slow..
>
>     --
>     Lauri
>
>
>     On Tue, Mar 12, 2019 at 11:21 PM Lauri Ehrenpreis
>     <laurioma@xxxxxxxxx <mailto:laurioma@xxxxxxxxx>> wrote:
>
>         IN the case when memory is slow, the rocm-smi outputs this:
>         ========================        ROCm System Management
>         Interface ========================
>         ================================================================================================
>         GPU   Temp   AvgPwr   SCLK    MCLK    PCLK          Fan   
>          Perf    PwrCap   SCLK OD  MCLK OD  GPU%
>         GPU[0] : WARNING: Empty SysFS value: pclk
>         GPU[0] : WARNING: Unable to read
>         /sys/class/drm/card0/device/gpu_busy_percent
>         0     30.0c  N/A      400Mhz  933Mhz  N/A           0%     
>         auto    N/A      0% 0%       N/A
>         ================================================================================================
>         ========================               End of ROCm SMI Log
>         ========================
>
>         normal memory speed case gives following:
>         ========================        ROCm System Management
>         Interface ========================
>         ================================================================================================
>         GPU   Temp   AvgPwr   SCLK    MCLK    PCLK          Fan   
>          Perf    PwrCap   SCLK OD  MCLK OD  GPU%
>         GPU[0] : WARNING: Empty SysFS value: pclk
>         GPU[0] : WARNING: Unable to read
>         /sys/class/drm/card0/device/gpu_busy_percent
>         0     35.0c  N/A      400Mhz  1200Mhz N/A           0%     
>         auto    N/A      0% 0%       N/A
>         ================================================================================================
>         ========================               End of ROCm SMI Log
>         ========================
>
>         So there is a difference in MCLK - can this cause such a huge
>         slowdown?
>
>         --
>         Lauri
>
>         On Tue, Mar 12, 2019 at 6:39 PM Kuehling, Felix
>         <Felix.Kuehling@xxxxxxx <mailto:Felix.Kuehling@xxxxxxx>> wrote:
>
>             [adding the list back]
>
>             I'd suspect a problem related to memory clock. This is an
>             APU where
>             system memory is shared with the CPU, so if the SMU
>             changes memory
>             clocks that would affect CPU memory access performance. If
>             the problem
>             only occurs when OpenCL is running, then the compute power
>             profile could
>             have an effect here.
>
>             Laurie, can you monitor the clocks during your tests using
>             rocm-smi?
>
>             Regards,
>                Felix
>
>             On 2019-03-11 1:15 p.m., Tom St Denis wrote:
>             > Hi Lauri,
>             >
>             > I don't have ROCm installed locally (not on that team at
>             AMD) but I
>             > can rope in some of the KFD folk and see what they say :-).
>             >
>             > (in the mean time I should look into installing the ROCm
>             stack on my
>             > Ubuntu disk for experimentation...).
>             >
>             > Only other thing that comes to mind is some sort of
>             stutter due to
>             > power/clock gating (or gfx off/etc).  But that typically
>             affects the
>             > display/gpu side not the CPU side.
>             >
>             > Felix:  Any known issues with Raven and ROCm interacting
>             over memory
>             > bus performance?
>             >
>             > Tom
>             >
>             > On Mon, Mar 11, 2019 at 12:56 PM Lauri Ehrenpreis
>             <laurioma@xxxxxxxxx <mailto:laurioma@xxxxxxxxx>
>             > <mailto:laurioma@xxxxxxxxx <mailto:laurioma@xxxxxxxxx>>>
>             wrote:
>             >
>             >     Hi!
>             >
>             >     The 100x memory slowdown is hard to belive indeed. I
>             attached the
>             >     test program with my first e-mail which depends only on
>             >     rocm-opencl-dev package. Would you mind compiling it
>             and checking
>             >     if it slows down memory for you as well?
>             >
>             >     steps:
>             >     1) g++ cl_slow_test.cpp -o cl_slow_test -I
>             >     /opt/rocm/opencl/include/ -L
>             /opt/rocm/opencl/lib/x86_64/  -lOpenCL
>             >     2) logout from desktop env and disconnect
>             hdmi/diplayport etc
>             >     3) log in over ssh
>             >     4) run the program ./cl_slow_test 1
>             >
>             >     For me it reproduced even without step 2 as well but
>             less
>             >     reliably. moving mouse for example could make the
>             memory speed
>             >     fast again.
>             >
>             >     --
>             >     Lauri
>             >
>             >
>             >
>             >     On Mon, Mar 11, 2019 at 6:33 PM Tom St Denis
>             <tstdenis82@xxxxxxxxx <mailto:tstdenis82@xxxxxxxxx>
>             >     <mailto:tstdenis82@xxxxxxxxx
>             <mailto:tstdenis82@xxxxxxxxx>>> wrote:
>             >
>             >         Hi Lauri,
>             >
>             >         There's really no connection between the two
>             other than they
>             >         run in the same package.  I too run a 2400G (as my
>             >         workstation) and I got the same ~6.6GB/sec
>             transfer rate but
>             >         without a CL app running ...  The only logical
>             reason is your
>             >         CL app is bottlenecking the APUs memory bus but
>             you claim
>             >         "simply opening a context is enough" so
>             something else is
>             >         going on.
>             >
>             >         Your last reply though says "with it running in the
>             >         background" so it's entirely possible the CPU
>             isn't busy but
>             >         the package memory controller (shared between
>             both the CPU and
>             >         GPU) is busy.  For instance running xonotic in a
>             1080p window
>             >         on my 4K display reduced the memory test to
>             5.8GB/sec and
>             >         that's hardly a heavy memory bound GPU app.
>             >
>             >         The only other possible connection is the GPU is
>             generating so
>             >         much heat that it's throttling the package which
>             is also
>             >         unlikely if you have a proper HSF attached (I
>             use the ones
>             >         that came in the retail boxes).
>             >
>             >         Cheers,
>             >         Tom
>             >
>
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux