I have 2 Gigabyte RX580's in my desktop workstation. I'm running Arch Linux with KDE Plasma on the 5.0.6 kernel.
The cards themselves work fine, except, I have two 1080p HDMI monitors plugged into one of these cards. One in a native HDMI port, one in a passive DVI->HDMI adapter.
This causes the following problem for idle usage:
1. Memory clock is effectively locked at 200Mhz always 2. Core clock is constantly at high frequency P-state 3. Temperatures are increased 4. Power consumption is increased (significantly) 5. PCI bus is always at full speed 6. Forcing core clock to 300Mhz, uses a higher than usual voltage
Below is an excerpt from the rocm-smi utility for the automatic defaults (I have omitted overclock and powercap values for formatting purposes)
2 Monitors connected to GPU 0, No monitors connected to GPU 1 ROCm System Management Interface =============================================================================== GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf GPU% 0 44.0c 36.193W 1145Mhz 2000Mhz 8.0GT/s, x16 40.0% auto 0% 1 37.0c 28.104W 300Mhz 300Mhz 2.5GT/s, x8 0.0% auto 0% =============================================================================== End of ROCm SMI Log
GPU 0 is idle and yet running SCLK and MCLK at unnecessary power levels GPU 1 is truly idle Regarding GPU 0 temperature, I have actually setup a daemon to run the fan at a consistent rate to prevent it from constantly peaking.
-------------------------------------------------------------------------------
1 Monitors connected to GPU 0, No monitors connected to GPU 1 ROCm System Management Interface =============================================================================== GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf GPU% 0 36.0c 28.103W 300Mhz 300Mhz 2.5GT/s, x8 0.0% auto 0% 1 37.0c 28.104W 300Mhz 300Mhz 2.5GT/s, x8 0.0% auto 0% ===============================================================================
2 Monitors connected to GPU 0, No monitors connected to GPU 1
2 Monitors connected to GPU 0, No monitors connected to GPU 1 ROCm System Management Interface =============================================================================== GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf GPU% 0 44.0c 31.086W 300Mhz 2000Mhz 2.5GT/s, x8 40.0% low 0% 1 37.0c 28.104W 300Mhz 300Mhz 2.5GT/s, x8 0.0% low 0% ===============================================================================
Peculiarly even with low power state forced, the GPU runs at a voltage (950mV) in excess of what is required for 300Mhz (750mV)
=============================================================================== cat /sys/kernel/debug/dri/0/amdgpu_pm_info jupiter: Mon Apr 8 21:57:29 2019
Clock Gating Flags Mask: 0x3fbcf Graphics Medium Grain Clock Gating: On Graphics Medium Grain memory Light Sleep: On Graphics Coarse Grain Clock Gating: On Graphics Coarse Grain memory Light Sleep: On Graphics Coarse Grain Tree Shader Clock Gating: Off Graphics Coarse Grain Tree Shader Light Sleep: Off Graphics Command Processor Light Sleep: On Graphics Run List Controller Light Sleep: On Graphics 3D Coarse Grain Clock Gating: Off Graphics 3D Coarse Grain memory Light Sleep: Off Memory Controller Light Sleep: On Memory Controller Medium Grain Clock Gating: On System Direct Memory Access Light Sleep: Off System Direct Memory Access Medium Grain Clock Gating: On Bus Interface Medium Grain Clock Gating: Off Bus Interface Light Sleep: On Unified Video Decoder Medium Grain Clock Gating: On Video Compression Engine Medium Grain Clock Gating: On Host Data Path Light Sleep: On Host Data Path Medium Grain Clock Gating: On Digital Right Management Medium Grain Clock Gating: Off Digital Right Management Light Sleep: Off Rom Medium Grain Clock Gating: On Data Fabric Medium Grain Clock Gating: Off
GFX Clocks and Power: 2000 MHz (MCLK) 300 MHz (SCLK) 600 MHz (PSTATE_SCLK) 1000 MHz (PSTATE_MCLK) 950 mV (VDDGFX) 31.14 W (average GPU)
GPU Temperature: 43 C GPU Load: 0 %
UVD: Disabled
VCE: Disabled ===============================================================================
Using amdgpu.ppfeaturemask=0xffffffff I am able to work around all of the above issues, but requires me to manually set idle and performance clock speeds as required. 300mhz memory and core drive 2 HDMI 1080p displays just fine. But this leads to screen tearing/green visible artefacting when *changing* core clock speeds. Like there is a synchronization issue. But when running at a fixed speed, all is well.
The temperatures alone show that power is being wasted.
I have a UPS that can reasonably accurately (16W steps) measure system power consumption. At idle with default settings letting the kernel and gpu's deal with things themselves I sometimes read ~196W idle power!
2 Monitors (auto) -> 196W Idle 2 Monitors (low) -> 160W Idle 2 Monitors (Force 300) -> 112-128W Idle 1 monitor -> 96-128W Idle
Even if my UPS isn't giving the exact true values, that delta is concerning.
It is a longstanding issue which has been bugging me for a while now. But it should really be fixed as the issue carries a quite large associated thermal and power burden.
I have tried poking through the source code to figure this out, but no luck. Have I missed something? Is there a problem synchronizing display VSYNC on clock changes? Why is this happening? It's clearly not the right behaviour.
What can be done to fix this? Can I help?
Best Regards Rigo Reddig |
Attachment:
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx