/me puts ego on hold and tries to be constructive without ranting... > in my current tree I have > > - suspend_prepare (I went with Ben's name, maybe that strokes his ego > enough that he'll admit it's better now) Heh. > - suspend (same as old) Ok. Well, most of my latest burst was about blocking of incoming "requests" but we can discuss that separately. Indeed, just adding the other calls don't break anything as it is. > - suspend_late Ok, so this is a cleanup over the old stuff we had for returning a special error from suspend to be called again later with interrupts off. I agree it sucked, though I never actually used it. Better have it well defined this way. Now wether or when drivers shall use it and when they shall do so is a different question :) (Obviously, not drivers that rely on a complex parent bus like USB, firewire, etc etc... but more like PCI drivers, though there is also the problem of how does that "suspend_late" fits in the context of dynamic PM in a live system. But we can re-discuss that later. > - resume_early Same as above. > - resume (same as old) > > (and I really wanted to do a "resume_finish()" too after user-land resume, > just to have the "reverse" three phases of resume as I have of suspend, > but I decided I didn't have any driver that I would make use of it > personally) This one will be needed as soon as we tackle the problem of devices that do request_firmware and/or communicate with userland. I have one user at least already for it on powerpc which is the APM emulation (I emulate /dev/apm_bios for the few userland stuffs that do care about suspend/resume). I think most wireless drivers that need firmwares should be fixed to use prepare/finish to preload the firmware in memory and get rid of that preloaded image. That way, their resume can use the preloaded firmware rather than deadlock/fail in request_firmware() bcs userland isn't in a state where it can service it. First candidate for me here is bcm43xx There is also my idea that bus drivers could stop inserting new devices after prepare(), not something I'm necessarily very firm on, it's just an idea that I though might make life easier but can definitely be debated. > > One thing that might help us get there is if we passed a suspend notification > > to the class devices (i.e. the higher level subsystems). > > Good point. We probably should. That really really makes sense, and that > also automagically solves the "network device" issue. > > I'll do that too, it actually looks pretty simple (famous last words). Yes, that would be definitely a good thing, though while adding the callback is simple, when to call it is not... (or rather is not with the current implementation). It seems to me that class devices as essentially the childs of the device as far as PM is concerned (suspended before the device and resumed after). Thus they should be inserted in the PM tree at the right place. Right now, they are not. I wonder if we shall bite the bullet and finally go for a completely separate PM "tree" structure (or worse, a dependency graph that some embedded people ask for but I dislike it). Right now, we have a list and we hope we always insert things at the right place. Not sure it can accomodate class devices though. > > I'm curious about your thoughts on runtime suspending of devices are, such as > > the resource rebalancing or cpufreq cases I suggested earlier. > > I really don't see that as my primary worry. Runtime suspend is "nice", > but it's not a _primary_ goal for me. Ok. It's been one for embedded and handhelds folks though lately and is necessary for a few things today like shutting down your wireless interface in a place (yeah, stupid, but heh !). In most case, it can be handled totally locally to a given driver though. But we have been looking into making it better by properly using the PM core to "escalate" power state changes of drivers, allowing things like entire busses to be unclocked when all devices on them are off, that sort of thing. > I think it should be pretty easy to implement, and I think your subsystem > suspend notification thing would help a lot (to basically guarantee that > the subsystem doesn't try to use it). Yes. Though we are talking about two slightly different things: class device and subsystems. In the first case, we have an entity that could be considered as a funcitonal child of the device (netdev class devices etc...) and get called before. In the later case, we have a subsystem routine that is explicitely called by the driver at suspend to ask the subsystem to leave it alone. Unless you want to suspend all subsystem's before you suspend all drivers but I'm not sure that will not lead into various sort of problems where subsystems are part of a transport layer needed by some drivers to suspend... But it's essentiall the same idea. That is definitely a good way to split suspend() and make it safer, because it would provide proper blocking of requests etc... that I'm so big about, at the sysbsytem or class device layer. In fact, it's more/or less how I did IDE back then (not with class devices but by having 2 devices separate for the disk and the controller, sounds logical today, wasn't back then in the state where the IDE layer was). The disk gets suspended first, then the controller. By the time the controller suspend is called, it doesn't have to worry about requests or anything like that, it just change the power state. The disk drivers gets the complicated logic of blocking queues, sending spindown commands, etc... Which is cool, there is _one_ disk driver to debug and dozens of controller drivers. That sort of split, I'm all about. That is, not splitting suspend() into different sub-callbacks to the same driver, which for the various reasons I already went on too much about, I think isn't necessarily a solution, but by splitting the functionality between different drivers. Network is definitely something we could handle in part by having suspend/resume at the generic eth level (netdev class device). There would still be a little care to take in drivers about things like ioctl's (for those who still take thse, though I suppose even there, the netdev layer might be able to block them) and drivers that have their own timer/workqueues/threads to do link management (though we have been working toward a generic PHY layer that makes the various PHYs separate drivers, so heh, here again, we _can_ split the complicated work, but not within a driver, between layers of drivers). That doesn't necessarily fix the main debuggability problem which is the console though. fbdev will have a hard time being suspended "late" because it needs to take the console semaphore to do the suspend safely and it's difficult to do so with interrupts disabled (you can try to get it, but you can't just call acquire_console_semaphore, unless you go silencing a lot of atomicity warnings we have all over the place). I suppose pure PCI network drivers could suspend "late" using your second callback mecanism, thus allowing netconsole to survive a bit longer, though as I mentioned earlier, that scheme doesn't quite fit with the needs of runtime/dynamic PM... at least if the driver _assumes_ it has interrupts off. However, we could just do a 2 pass mecanism instaed with the second pass sitll not having irqs off, but having shut down all clients of "directly mapped" devices (PCI etc...) and thus letting those be suspended _after_ all the others. In our above examples, we would get the first pass do - usb devices, firewire devices, all devices depending on an upper transport driver basically - the class devices like netdev's (maybe with tweaks so that netconsole is still operational via hacks in the driver tho) And the second pass would do - pci devices (network drivers typically, fbdev's) - pci bridges In addition, we might want this "irq off" pass for low level system things (like the PIC themselves) or broken legacy devices. Could be a 3rd pass. Right now, we have both the dodgy "return that error from suspend to be called later with irq offs" hack _and_ the sysdevs. I hate the sysdev's because they are just duplicate of some of the struct device logic with another name, and just don't fit well in the picture. I'd rather have had a separate callback to struct device and have them be normal struct device. They've also been abused by cpufreq which cause regulary problems with suspend. So cleanup in that area is welcome. Now there is still the question of how things like usb controllers would fit in the above picture. Different problems. USB has it's own issues that it mgith want itself to be split between a toplevel that is suspended in the first pass (request processing etc..) and a bottom level that happens in the second pass (actual controller D3). > > Do you have any opinions on how this might be handled? So far, I've > > been favoring usage of the same sort of freeze() mechanism used for > > preparing for memory snapshots etc. > > Let me reboot my current kernel to test my current five-phase thing, and > I'll do the subsystem thing too. > > My off-the-cuff plan for that is to just add a "suspend(dev, state)" > callback to the subsystem structure, and have device_suspend() call the > subsystem suspend function before it even calls the actual device suspend > function (and in reverse order on resume, of course). > > Again - I'm not actually planning on doing very many individual drivers > (that's the point I _don't_ care about), I want the support infrastructure > to be sane. > > (That, btw, obviously indirectly means that I'm not willing to break > existing drivers - my infrastructure is strictly a _superset_ of what they > get now). > > Linus > _______________________________________________ > linux-pm mailing list > linux-pm at lists.osdl.org > https://lists.osdl.org/mailman/listinfo/linux-pm