Comment # 83
on bug 110674
from ReddestDream
> Here's what I found: The value of hard_min_level is 1001 in both 5.0.13 and 5.2.7 so the issue is not the value from the dpm table. The dpm table is probably correct. Fantastic! Glad you tested this. I had suspected the hard_min_level was bogus and that's why it was failing. Card was rejecting the bogus value. Glad to know that's not the case. > However, what is interesting is that it doesn't always fail. Yeah. I've had boots where I have my 2 4K DP monitors in and I don't get powerplay error on boot. In fact, it can go a bit and seem stable. But then the powerplay errors suddenly (not related to some high load on the card) start showing up again and the graphics become unstable. Similarly others have reported that on hotplugging a second monitor after boot, the powerplay errors will start showing up. So, maybe there is a timing problem involved with sending the message. It's generally a question of when rather than if it's going to fail. > 1. vega20_set_fclk_to_highest_dpm_level is called twice between the "ring vce2" line and "Initialized" Is it always called twice? Even on 5.2.7? Because it looks like it might get called two times right before "Initialized" on 5.0.13 but then only once on 5.2.7 before "Initialized" kicks in. Maybe "Initialized" is interrupting on 5.2.7 but not on 5.0.13. It's possible that Initialization of the card is messing up values that powerplay needs to read off the card or making the card unavailable for receiving messages or something . . . > So initialization is happening between (and possibly a result of) sending the message and getting the response Yeah. Something is definitely happening while vega20_set_uclk_to_highest_dpm_level is running . . . Not 100% sure that's really problematic tho . . . But it could be an atomicity issue. Need to figure out what exactly what is generating the line "[drm] Initialized amdgpu 3.27.0 20150101 for 0000:44:00.0 on minor 0." Looks like it's coming from the drm core rather than amdgpu specifically. > I'm going to see if I can disable/revert BACO entirely to at least rule it out. I thought BACO was reverted for Vega 20 here: https://github.com/torvalds/linux/commit/7db329e57b90ddebcb58fc88eedbb3082d22a957#diff-8a4d25be8ad5d9c3ff27bb54b678dab2 Your commit seems to have been introduced in 5.2-rc1, not 5.1.
You are receiving this mail because:
- You are the assignee for the bug.
_______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel