On Monday 13 March 2006 23:08, Pavel Machek wrote: > > > > Yep, I call that suspend-to-both. It is planned, but not really > > > > trivial, and I'm a little busy. If someone wants to help.... > > > > > > I was thinking a few days ago. With your move of all this stuff to > > > userspace, if it was done in multiple stages, we could implement > > > a form of checkpointing this way. > > > > > > So instead of doing the 'suspend to disk/ram' after 'write out all pages', > > > we just continue. > > > > > > Why is this useful ? We've seen bugs reported that only ever bite > > > customers after they've run their workload for a month. Now, if they had a > > > means of checkpointing, then when it crashes, they could capture the last > > > image that landed somewhere, and set that up for more tests/monitoring with > > > kprobes etc and reproduce those hard-to-reproduce bugs a lot faster. > > > > I've been asked about this from time to time too. Apart from the issues Pavel > > has already mentioned, the big problem in my mind was figuring out what to do > > about disk storage. As the algorithm stands at the moment, the image includes > > information about the state of mounted filesystems. We'd need to somehow get > > rid of or be able to ignore that. Any suggestions? > > Well, copying all the filesystems would work, as would having no > filesystems at all :-) [ramdisk case]. And perhaps practical > equivalent of "copy all filesystems" can be done with device mapper. > > [Of course, you'd have to copy all the filesystems back before doing > resume]. If we had anything like fs suspend/resume, we could handle such things. We could also handle the "USB device mounted before suspend" problem (I think it's related). Greetings, Rafael