Hi, On 2/20/24 16:15, Alex Deucher wrote: > On Tue, Feb 20, 2024 at 10:03 AM Linux regression tracking (Thorsten > Leemhuis) <regressions@xxxxxxxxxxxxx> wrote: >> >> On 20.02.24 15:45, Alex Deucher wrote: >>> On Mon, Feb 19, 2024 at 9:47 AM Linux regression tracking (Thorsten >>> Leemhuis) <regressions@xxxxxxxxxxxxx> wrote: >>>> >>>> On 17.02.24 14:30, Greg KH wrote: >>>>> On Sat, Feb 17, 2024 at 02:01:54PM +0100, Roman Benes wrote: >>>>>> Minimum power limit on latest(6.7+) kernels is 190W for my GPU (RX 6700XT, >>>>>> mesa, archlinux) and I cannot get power cap as low as before(to 115W), >>>>>> neither with Corectrl, LACT or TuxClocker and /sys have a variable read-only >>>>>> even for root. This is not of above apps issue but of the kernel, I read >>>>>> similar issues from other bug reports of above apps. I downgraded to v6.6.10 >>>>>> kernel and my 115W(under power)cap work again as before. >>>>> >>>> For the record and everyone that lands here: the cause is known now >>>> (it's 1958946858a62b ("drm/amd/pm: Support for getting power1_cap_min >>>> value") [v6.7-rc1]) and the issue afaics tracked here: >>>> >>>> https://gitlab.freedesktop.org/drm/amd/-/issues/3183 >>>> >>>> Other mentions: >>>> https://gitlab.freedesktop.org/drm/amd/-/issues/3137 >>>> https://gitlab.freedesktop.org/drm/amd/-/issues/2992 >>>> >>>> Haven't seen any statement from the amdgpu developers (now CCed) yet on >>>> this there (but might have missed something!). From what I can see I >>>> assume this will likely be somewhat tricky to handle, as a revert >>>> overall might be a bad idea here. We'll see I guess. >>> >>> The change aligns the driver what has been validated on each board >>> design. Windows uses the same limits. Using values lower than the >>> validated range can lead to undefined behavior and could potentially >>> damage your hardware. >> >> Thx for the reply! Yeah, I was expecting something along those lines. >> >> Nevertheless it afaics still is a regression in the eyes of many users. >> I'm not sure how Linus feels about this, but I wonder if we can find >> some solution here so that users that really want to, can continue to do >> what was possible out-of-the box before. Is that possible to realize or >> even supported already? >> >> And sure, those users would be running their hardware outside of its >> specifications. But is that different from overclocking (which the >> driver allows, doesn't it? If not by all means please correct me!)? > > Sure. The driver has always had upper bound limits for overclocking, > this change adds lower bounds checking for underclocking as well. > When the silicon validation teams set the bounding box for a device, > they set a range of values where it's reasonable to operate based on > the characteristics of the design. > > If we did want to allow extended underclocking, we need a big warning > in the logs at the very least. Requiring a module-option to be set to allow this, as well as a big warning in the logs sounds like a good solution to me. Regards, Hans >>>> Roman posted something that apparently was meant to go to the list, so >>>> let me put it here: >>>> >>>> """ >>>> UPDATE: User fililip already posted patch, but it need to be merged, >>>> discussion is on gitlab link below. >>>> >>>> (PS: I hope I am replying correctly to "all" now? - using original addr.) >>>> >>>> >>>>> it seems that commit was already found(see user's 'fililip' comment): >>>>> >>>>> https://gitlab.freedesktop.org/drm/amd/-/issues/3183 >>>>> commit 1958946858a62b6b5392ed075aa219d199bcae39 >>>>> Author: Ma Jun <Jun.Ma2@xxxxxxx> >>>>> Date: Thu Oct 12 09:33:45 2023 +0800 >>>>> >>>>> drm/amd/pm: Support for getting power1_cap_min value >>>>> >>>>> Support for getting power1_cap_min value on smu13 and smu11. >>>>> For other Asics, we still use 0 as the default value. >>>>> >>>>> Signed-off-by: Ma Jun <Jun.Ma2@xxxxxxx> >>>>> Reviewed-by: Kenneth Feng <kenneth.feng@xxxxxxx> >>>>> Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> >>>>> >>>>> However, this is not good as it remove under-powering range too far. I >>>> was getting only about 7% less performance but 90W(!) less consumption >>>> when set to my 115W before. Also I wonder if we as a OS of options and >>>> freedom have to stick to such very high reference for min values without >>>> ability to override them through some sys ctrls. Commit was done by amd >>>> guy and I wonder if because of maybe this post that I made few months >>>> ago(business strategy?): >>>>> >>>>> >>>> https://www.reddit.com/r/Amd/comments/183gye7/rx_6700xt_from_230w_to_capped_115w_at_only_10/ >>>>> >>>>> This is not a dangerous OC upwards where I can understand desire to >>>> protect HW, it is downward, having min cap at 190W when card pull on >>>> 115W almost same speed is IMO crazy to deny. We don't talk about default >>>> or reference values here either, just a move to lower the range of >>>> options for whatever reason. >>>>> >>>>> I don't know how much power you guys have over them, but please >>>> consider either reverting this change, or give us an option to set >>>> min_cap through say /sys (right now param is readonly, even for root). >>>>> >>>>> >>>>> Thank you in advance for looking into this, with regards: Romano >>>> """ >>>> >>>> And while at it, let me add this issue to the tracking as well >>>> >>>> [TLDR: I'm adding this report to the list of tracked Linux kernel >>>> regressions; the text you find below is based on a few templates >>>> paragraphs you might have encountered already in similar form. >>>> See link in footer if these mails annoy you.] >>>> >>>> Thanks for the report. To be sure the issue doesn't fall through the >>>> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression >>>> tracking bot: >>>> >>>> #regzbot introduced 1958946858a62b / >>>> #regzbot title drm: amdgpu: under-powering broke >>>> >>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) >>>> -- >>>> Everything you wanna know about Linux kernel regression tracking: >>>> https://linux-regtracking.leemhuis.info/about/#tldr >>>> That page also explains what to do if mails like this annoy you. >>> >>> >