On Tue, Feb 20, 2024 at 10:42 AM Linux regression tracking (Thorsten Leemhuis) <regressions@xxxxxxxxxxxxx> wrote: > > > > On 20.02.24 16:27, Hans de Goede wrote: > > Hi, > > > > On 2/20/24 16:15, Alex Deucher wrote: > >> On Tue, Feb 20, 2024 at 10:03 AM Linux regression tracking (Thorsten > >> Leemhuis) <regressions@xxxxxxxxxxxxx> wrote: > >>> > >>> On 20.02.24 15:45, Alex Deucher wrote: > >>>> On Mon, Feb 19, 2024 at 9:47 AM Linux regression tracking (Thorsten > >>>> Leemhuis) <regressions@xxxxxxxxxxxxx> wrote: > >>>>> > >>>>> On 17.02.24 14:30, Greg KH wrote: > >>>>>> On Sat, Feb 17, 2024 at 02:01:54PM +0100, Roman Benes wrote: > >>>>>>> Minimum power limit on latest(6.7+) kernels is 190W for my GPU (RX 6700XT, > >>>>>>> mesa, archlinux) and I cannot get power cap as low as before(to 115W), > >>>>>>> neither with Corectrl, LACT or TuxClocker and /sys have a variable read-only > >>>>>>> even for root. This is not of above apps issue but of the kernel, I read > >>>>>>> similar issues from other bug reports of above apps. I downgraded to v6.6.10 > >>>>>>> kernel and my 115W(under power)cap work again as before. > >>>>>> > >>>>> For the record and everyone that lands here: the cause is known now > >>>>> (it's 1958946858a62b ("drm/amd/pm: Support for getting power1_cap_min > >>>>> value") [v6.7-rc1]) and the issue afaics tracked here: > >>>>> > >>>>> https://gitlab.freedesktop.org/drm/amd/-/issues/3183 > >>>>> > >>>>> Other mentions: > >>>>> https://gitlab.freedesktop.org/drm/amd/-/issues/3137 > >>>>> https://gitlab.freedesktop.org/drm/amd/-/issues/2992 > >>>>> > >>>>> Haven't seen any statement from the amdgpu developers (now CCed) yet on > >>>>> this there (but might have missed something!). From what I can see I > >>>>> assume this will likely be somewhat tricky to handle, as a revert > >>>>> overall might be a bad idea here. We'll see I guess. > >>>> > >>>> The change aligns the driver what has been validated on each board > >>>> design. Windows uses the same limits. Using values lower than the > >>>> validated range can lead to undefined behavior and could potentially > >>>> damage your hardware. > >>> > >>> Thx for the reply! Yeah, I was expecting something along those lines. > >>> > >>> Nevertheless it afaics still is a regression in the eyes of many users. > >>> I'm not sure how Linus feels about this, but I wonder if we can find > >>> some solution here so that users that really want to, can continue to do > >>> what was possible out-of-the box before. Is that possible to realize or > >>> even supported already? > >>> > >>> And sure, those users would be running their hardware outside of its > >>> specifications. But is that different from overclocking (which the > >>> driver allows, doesn't it? If not by all means please correct me!)? > >> > >> Sure. The driver has always had upper bound limits for overclocking, > >> this change adds lower bounds checking for underclocking as well. > >> When the silicon validation teams set the bounding box for a device, > >> they set a range of values where it's reasonable to operate based on > >> the characteristics of the design. > >> > >> If we did want to allow extended underclocking, we need a big warning > >> in the logs at the very least. > > > > Requiring a module-option to be set to allow this, as well as a big > > warning in the logs sounds like a good solution to me. > > Yeah, especially as it sounds from some of the reports as if some > vendors did a really bad job when it came to setting the proper > lower-bound limits are now adhered -- and thus higher then what we used > out-of-the box before 1958946858a62b was applied. > > Side note: I assume those "lower bounds checking" is done round about > the same way by the Windows driver? Does that one allow users to go > lower somehow? Say after modifying the registry or something like that? > Or through external tools? Windows uses the same limit. I'm not aware of any way to override the limit on windows off hand. Alex > > Ciao, Thorsten > > >>>>> Roman posted something that apparently was meant to go to the list, so > >>>>> let me put it here: > >>>>> > >>>>> """ > >>>>> UPDATE: User fililip already posted patch, but it need to be merged, > >>>>> discussion is on gitlab link below. > >>>>> > >>>>> (PS: I hope I am replying correctly to "all" now? - using original addr.) > >>>>> > >>>>> > >>>>>> it seems that commit was already found(see user's 'fililip' comment): > >>>>>> > >>>>>> https://gitlab.freedesktop.org/drm/amd/-/issues/3183 > >>>>>> commit 1958946858a62b6b5392ed075aa219d199bcae39 > >>>>>> Author: Ma Jun <Jun.Ma2@xxxxxxx> > >>>>>> Date: Thu Oct 12 09:33:45 2023 +0800 > >>>>>> > >>>>>> drm/amd/pm: Support for getting power1_cap_min value > >>>>>> > >>>>>> Support for getting power1_cap_min value on smu13 and smu11. > >>>>>> For other Asics, we still use 0 as the default value. > >>>>>> > >>>>>> Signed-off-by: Ma Jun <Jun.Ma2@xxxxxxx> > >>>>>> Reviewed-by: Kenneth Feng <kenneth.feng@xxxxxxx> > >>>>>> Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> > >>>>>> > >>>>>> However, this is not good as it remove under-powering range too far. I > >>>>> was getting only about 7% less performance but 90W(!) less consumption > >>>>> when set to my 115W before. Also I wonder if we as a OS of options and > >>>>> freedom have to stick to such very high reference for min values without > >>>>> ability to override them through some sys ctrls. Commit was done by amd > >>>>> guy and I wonder if because of maybe this post that I made few months > >>>>> ago(business strategy?): > >>>>>> > >>>>>> > >>>>> https://www.reddit.com/r/Amd/comments/183gye7/rx_6700xt_from_230w_to_capped_115w_at_only_10/ > >>>>>> > >>>>>> This is not a dangerous OC upwards where I can understand desire to > >>>>> protect HW, it is downward, having min cap at 190W when card pull on > >>>>> 115W almost same speed is IMO crazy to deny. We don't talk about default > >>>>> or reference values here either, just a move to lower the range of > >>>>> options for whatever reason. > >>>>>> > >>>>>> I don't know how much power you guys have over them, but please > >>>>> consider either reverting this change, or give us an option to set > >>>>> min_cap through say /sys (right now param is readonly, even for root). > >>>>>> > >>>>>> > >>>>>> Thank you in advance for looking into this, with regards: Romano > >>>>> """ > >>>>> > >>>>> And while at it, let me add this issue to the tracking as well > >>>>> > >>>>> [TLDR: I'm adding this report to the list of tracked Linux kernel > >>>>> regressions; the text you find below is based on a few templates > >>>>> paragraphs you might have encountered already in similar form. > >>>>> See link in footer if these mails annoy you.] > >>>>> > >>>>> Thanks for the report. To be sure the issue doesn't fall through the > >>>>> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression > >>>>> tracking bot: > >>>>> > >>>>> #regzbot introduced 1958946858a62b / > >>>>> #regzbot title drm: amdgpu: under-powering broke > >>>>> > >>>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > >>>>> -- > >>>>> Everything you wanna know about Linux kernel regression tracking: > >>>>> https://linux-regtracking.leemhuis.info/about/#tldr > >>>>> That page also explains what to do if mails like this annoy you. > >>>> > >>>> > >> > > > > > >