Hi, On Fri, Apr 23, 2021 at 03:24:51PM -0700, Sowjanya Komatineni wrote: > On 4/23/21 1:16 PM, Lukasz Luba wrote: > > Hi Sowjanya, > > > > On 4/22/21 9:30 PM, Sowjanya Komatineni wrote: > > > Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs > > > which is > > > in charge of deciding on state transition based on target state, > > > state idle > > > time, and some other Tegra CPU core cluster states information. > > > > > > Current PSCI specification don't have function defined for passing > > > runtime > > > state idle time predicted by governor (based on next events and > > > state target > > > residency) to ARM trusted firmware. > > > > Do you have some numbers from experiments showing that these idle > > governor prediction values, which are passed from kernel to MCE > > firmware, are making a good 'guess'? > > How much precision (1us? 1ms?) in the values do you need there? > > it could also be in few ms depending on when next cpu event/activity might > happen which is not transparent to MCE firmware. > > > > > IIRC (probably Rafael's presentations) predicting in the kernel > > something like CPU idle time residency is not a trivial thing. > > > > Another idea (depending on DT structure and PSCI bits): > > Could this be solved differently, but just having a knowledge that if > > the governor requested some C-state, this means governor 'predicted' > > an idle residency to be greater that min_residency attached to this > > C-state? > > Then, when that request shows up in your FW, you know that it must be at > > least min_residency because of this C-state id. > C6 is the only deepest state for Tegra194 Carmel CPU that we support in > addition to C1 (WFI) idle state. > > MCE firmware gets state crossover thresholds for C1 to C6 transition from > TF-A and uses it along with state idle time to decide on C6 state entry > based on its background work. > > Assuming for now if we use min_residency as state idle time which is static > value from DT, then it enters into deepest state C6 always as we use > min_residency value we use is always higher than state crossover threshold. > > But MCE firmware is not aware of when next cpu event can happen to predict > if next event can take longer than state min_residency time. > > Using min residency in such case is very conservative where MCE firmware > exits C6 state early where we may not have better power saving. > > But with MCE firmware being aware of when next event can happen it can use > that to stay in C6 state without early exit for better power savings. > > > It would depend on number of available states, max_residency, scale > > that you would choose while assigning values from [0, max_residency] > > to each state. > > IIRC there can be many state IDs for idle, so it would depend on > > number of bits encoding this state, and your needs. Example of > > linear scale: > > 4-bits encoding idle state and max predicted residency 10msec, > > that means 10000us / 16 states = 625us/state. > > The max_residency might be split differently, using different than > > linear function, to have some rage more precised. > > > > Open question is if these idle states must be all represented > > in DT, or there is a way of describing a 'set of idle states' > > automatically. > We only support C6 state through DT as C6 is the only deepest state for > Tegra194 carmel CPU. WFI idle state is completely handled by kernel and does > not require MCE sequences for entry/exit. I think Lukasz's point is that you can encode the predicted idle time by having multiple idle_state entries with different min_residency mapping to the same actual idle-state. So you would several variants of C6 with different min_residencies and if the OS picks one with longer min_residency firmware would have a better estimate of the predicted idle residency. I'm not convinced it is the right way to work around passing this information on to firmware. I would rather see an example of how well this works (best with numbers) and have a proper solution. Morten