On Wednesday 16 August 2006 12:17 am, Ikhwan Lee wrote: > Hi, > > Thank you for your answers. > > > > Proper runtime power management requires drivers to change their > > > devices' power states as needed, with no intervention of the PM > > > core. Neither power/state nor power.power_state.event is really > > > necessary for this purpose. > > > > That's a key point that I think was not widely understood early on. The > > driver APIs exist to make sure systems can be cleanly shut down ... not > > to reduce power usage. At best, that sysfs power/state thing is a big > > distraction from actually trying to make drivers be power-efficient. > > Now I understand that the drivers are responsible for making runtime power > management decisions for individual devices. But there are cases that a > device driver does not have enough information to make the most effective > decision. In such a case, we may want to employ a high level power manager > to make decisions for them. And that will need some sort of programming interface. But those will be providing domain-specific hints and information, not describing the system-wide power state transitions which are being addressed by current bus/class/driver suspend()/resume() methods. Plus, not all drivers will have those cases and thus need new APIs. Best to let just those drivers pay the costs of such interfaces, and not try to inflict them universally. :) > Suppose we have a SoC with an on-chip multimedia codec. The codec can either > be clock-gated or power-gated, and it shares its power domain with a > neighboring IP, say a 3D engine. Clock gating can be done by the driver > since the clock can be controlled separately. However, the codec driver can > never perform power-gating since it does not know if the 3D engine is active > or not. We would prefer a centralized device power manager in this case. I wouldn't say "centralized"; it doesn't need to handle every device in the system. All it needs to understand is one power domain. And in at least some of these cases, a simple refcounted enable/disable API should be able to manage the power domain ... exactly like such an API already manages the clock domains. (You can obviously come up with examples where such refcounting doesn't suffice. But there are a lot where it does, including the one you described here.) There was a voltage domain API drafted a while back, by some folk at Nokia. The draft I saw was incomplete, but looked to be in the right kind of direction; and it was very much in the flavor of <linux/clk.h> but of course had to deal with the fact that power domains need to support a choice of voltages. Example: voltage fed to an MMC/SD card may often be 3.3V, but there are low voltage cards too; and many power domains can be modeled simply as "turn on/off the 2.2V supply". > This is a common case for state-of-the-art mobile handsets such as a DMB > phone. As a different example, a system-level power manager may want to put > the codec into a low quality (thus low power) mode regarding the battery > status. Certainly, this kind of decision cannot be made by the driver. You've got a bit of chicken-vs-egg going on there, in that you're assuming the answer is a system/global manager that knows about the codec!! While in fact there can easily be other solutions. Examples include notifying the user and letting _them_ choose which module to put into lowpower/off mode (maybe the WLAN instead, since the codec is more essential just now), or a general system "cut power usage" notification, which that driver will interpret in that way. But I certainly agree that there are cases where "higher level" inputs are needed to help drivers manage power usage effectively. I consider most of those to be domain-specific APIs, outside the specific scope of the PM framework. (But clearly needing to be pm-aware designs.) > IMO, having support for such use cases in the PM core and exporting > necessary driver APIs would not be a bad idea. Centralized device power > manager can keep track of the system power states A system power state manager should manage/track system power states, both operating points and sleep states. Board-specific in general; each SOC will have reusable functionality (lots of potential operating points), but then so will external chips ... there's lots of variability in how things get wired up, so a "glue" component will be needed too. That is: SOC stuff, plus device drivers, plus board glue, plus state info, plus configuration ... == system manager. I can't see any point to a manager that deals only with devices, since how they're managed (and what they are!) must be board-specific and must be integrated with the SOC stuff for stuff like power and clock domains, wake event processing, DMA (e.g. to on-SOC SRAM vs DRAM that might be in self-refresh mode), etc. > and interdependencies > among devices (the device model perfectly suits for this) while device > drivers provide necessary APIs for safe power state transitions. The device model has glitches in terms of not handling power/voltage domains at all, or devices that sit on multiple busses. One example I've seen quite often is an external multifunction chip with a highspeed serial bus for a codec data link, and a separate serial bus (I2C, SPI, etc) for its control link and lowspeed non-codec data. Plus, clusters of interrelated devices aren't handled that well, even if you assume that clock and power domain APIs will handle those issues; my pet example being USB-OTG modules, which started out involving five controllers (host, peripheral, and OTG for USB; plus I2c and external PHY) before chip vendors started to use more integrated design approaches. Given the wide variety of possible device power states, I really think it's best to try to keep the driver model out of that business. > > See list archives for the "RFC -- updated Documentation/power/devices.txt" > > thread; one of my last posts there has a version of that document with > > lots > > of examples of how runtime power saving works; it does NOT need to involve > > any kind of public power state updating. Things like cpufreq and dynamic > > tick, or power-aware idle tasks, don't need to change externally visible > > state any more than per-device power saving policies do. > > I have been following the thread. I especially like the section on runtime > power management, with lots of examples. I am actually working on some of > them, and my claim is that (as stated above) we may need to involve some > kind of public power state updating in a system-wide way. I can't disagree with that at all. :) One of my concerns is how to factor that stuff well -- "architect" it -- so that the approach is broadly applicable. A factoring that works well at the SOC level may not work as well at the board level; working well on one family of SOCs doesn't mean it works well on others. I feel comfortable saying we need a power/voltage domain framework, and an extensible framework for system-wide operating (and non-operating sleep) states. The rest looks to me like stuff that would best be worked out over time, while integrating those two things and fixing the inevitable botches in the initial designs, as applied to current hardware. - Dave