On Thu, 19 Dec 2024 at 22:21, Mario Limonciello <superm1@xxxxxxxxxx> wrote: > > On 12/19/2024 15:10, Antheas Kapenekakis wrote: > > On Thu, 19 Dec 2024 at 17:14, Mario Limonciello <superm1@xxxxxxxxxx> wrote: > >> > >> On 12/19/2024 09:24, Antheas Kapenekakis wrote: > >>> On Thu, 19 Dec 2024 at 15:50, Mario Limonciello <superm1@xxxxxxxxxx> wrote: > >>>> > >>>> On 12/19/2024 07:12, Antheas Kapenekakis wrote: > >>>>> Hi Mario, > >>>>> given that there is a Legion Go driver in the works, and Asus already > >>>>> has a driver, the only thing that would be left for locking down ACPI > >>>>> access is manufacturers w/o vendor APIs. > >>>>> > >>>>> So, can we restart the conversation about this driver? It would be > >>>>> nice to get to a place where we can lock down /dev/mem and ACPI by > >>>>> spring. > >>>> > >>>> As Shyam mentioned we don't have control for limits by the PMF driver > >>>> for this on PMF v2 (Strix) or later platforms. > >>>> > >>>> So if we were to revive this custom discussion it would only be for > >>>> Phoenix and Hawk Point platforms. > >>> > >>> That's unfortunate. > >>> > >>>>> > >>>>> Moreover, since the other two proposed drivers use the > >>>>> firmware_attributes API, should this be used here as well? > >>>> > >>>> I do feel that if we revive this conversation specifically for Phoenix > >>>> and Hawk Point platforms yes we should use the same API to expose it to > >>>> userspace as those other two drivers do. > >>>> > >>>> I'd like Shyam's temperature on this idea though before anyone spends > >>>> time on it. If he's amenable would you want to work on it? > >>> > >>> We currently expect the 2025 lineup to include a lot of Strix Point > >>> handhelds, so I'd like a solution that works with that. OneXPlayer > >>> released a model already, and GPD is getting ready to ship as well. > >>> > >>> Yeah, I could throw some hours to it after I go through some overdue stuff. > >>> > >>>>> > >>>>> By the way, you were right about needing a taint for this. Strix Point > >>>>> fails to enter a lower power state during sleep if you set it to lower > >>>>> than 10W. This is not ideal, as hawk point could go down to 5 while > >>>>> still showing a power difference, but I am unsure where this bug > >>>>> should be reported. This is both through ryzenadj/ALIB > >>>> > >>>> Who is to say this is a bug? Abusing a debugging interface with a > >>>> reverse engineered tool means you might be able to configure a platform > >>>> out of specifications. > >>> > >>> The spec being 10+W would be very undesirable for handhelds with Strix > >>> Point, so I'd hope somebody looks into it, esp. if it can be fixed > >>> with a BIOS fw update before more handhelds come out. I can raise the > >>> minimum TDP to 10W, with some user complaints. > >>> > >>> Asus and Lenovo use the same mailbox so they'd share the issue too. > >>> > >>> FYI for a typical handheld with e.g., a 60Wh battery, a 10W envelope > >>> results in around 20-22W total consumption which is around 2.5 hours. > >>> Hawk Point can be TDP limited down to 16W total consumption (TDP ~7W) > >>> and can go down to 8W with frame limiting etc. I do not have numbers > >>> for Strix Point yet, but to match Hawk Point it has to allow TDP to go > >>> down to 7W. I think for 2025, customer expectation will be 6-8 hours+ > >>> at low wattages. > >>> > >> > >> I've got a fundamental question - why the fixation on PPT? > >> > >> This just sets "limits" for the package. In Windows it's probably the > >> best knob to tune to adjust performance in an effort to extend battery > >> life, but in Linux we have a lot of other knobs: > >> > >> * the ability to tune EPP (energy_performance_preference) > >> * set min and max CPU frequencies (scaling_min_freq, scaling_max_freq) > > > > We use both of these. > > > >> * offline cores at will > > > > if a core is parked and you try to write into its sysfs entrypoints, > > we found that this might cause a userspace program to hang > > indefinitely. Since a lot of settings are per core that's problematic > > and since it does not help much most TDP programs dont offer it > > anymore. > > This sounds like a kernel bug if you're hanging programs when trying to > write to sysfs files of offlined cores. If we can get that fixed having > that in your toolbelt is quite useful. I'm sure there are plenty of > games that don't really need all the cores up and you can save some power. > > Can you get a simple reproducer for me into a bug report to look at next > year? I will try to. This was relayed to me. Disabling SMT also causes a crash on the Ally when going to sleep. > > > >> * change DPM setting in the GPU driver (power_dpm_force_performance_level) > > > > I think we played with this mostly to try to get lower than 800mhz. > > However, going lower than 800mhz in these APUs causes issues. > > > >> All the core related knobs can be changed on a per-core basis. So for > >> example even on a non-heterogeneous design you could potentially make it > >> perform "like" a hetero design where you set it so that some cores don't > >> go above nominal frequency or the EPP value is tuned less aggressively > >> on some cores. > > > >> These knobs can have just as drastic of a result on battery life as > >> adjusting the various power limiting knobs. Most importantly these > >> knobs have architectural limits that you won't be able to override so > >> you can safely change them to min/max and see what happens. > > > > I feel like we are discussing different targets here. When it comes to > > computing tasks, you have a certain block of work that needs to be > > done and after that the CPU is free. In this case, programs like tuned > > (allegedly) optimize these settings so that they take the minimum > > amount of power to complete that block of work. > > > > However, games are different. Games have no problem burning power if > > you let them and they are also playable at a variety of power levels. > > Typically, unless the user caps the framerate and video quality of the > > game it will use the full slow temp limit value. Even if they do set > > that, the game will typically burn 3-4W more than what is needed > > depending on TDP, EPP etc. > > Part of what I'm wondering is if our 4 levels of EPP values "aren't > enough" for optimization on a per game basis. > > IMO They're incredibly rigid. I do have a patch that can expose "raw" > numbers for amd-pstate like intel-pstate does, but I haven't brought it > on the lists yet because I'm still discussing it with others internal to > AMD. > > EPP is really about responsiveness in games. EPP performance is so detrimental we hide it. It destroys performance by sucking power from the GPU. EPP balance_performance is only useful in certain emulators that need a lot of CPU. Only balance_power is useful. Then, for TDPs lower than 10, setting EPP to power milks another 1-2W > > > > Therefore, the question we ask users is how loud do you want your > > device to be and how long (in hours) do you want the battery to last. > > This is done by ppt + the other settings, which are set automatically > > based on ppt. > > > > Then the users can compromise with what fidelity and fps they get > > based on their TDP. > > > >> I feel like specifically if you keep EPP at balance_performance, keep > >> scaling_min_freq at lowest non linear frequency and change > >> scaling_max_freq on a few of the cores you should be able to influence > >> the battery life quite a bit while still keeping the system responsive. > > > > The sweet spot on these APUs is max freq to be nominal (i.e., no > > boost), EPP at balance_power or power for very low tdps, and min_freq > > to be 0. Especially for min_freq, setting it to lowest non-linear > > seems to have no effect. > > Min freq can't go to 0, but maybe you mean the unitless perf value of 0, > right? > > There is a pretty big current swing you'll have going from perf 0 vs the > swing you get from lowest nonlinear perf. It might not be visually > noticeable, but I think it would be good to characterize how many joules > are used for a given predictable gaming "workload" to decide what to do. I set min_freq for the CPUs to the minimum one which maybe you are right it is 400mhz. When it comes to games, 1.4GHz (nonlinear) vs 400mhz does not make much of a difference. Antheas