[linux-pm] Re: standby to disk transition

rjw at sisk.pl (Rafael J. Wysocki) · Tue Mar 14 10:13:46 2006

On Tuesday 14 March 2006 01:18, Nigel Cunningham wrote:
> On Tuesday 14 March 2006 09:36, Rafael J. Wysocki wrote:
> > On Tuesday 14 March 2006 00:11, Nigel Cunningham wrote:
> > > On Tuesday 14 March 2006 08:42, Rafael J. Wysocki wrote:
> > > > On Monday 13 March 2006 23:08, Pavel Machek wrote:
> > > > > > >  > Yep, I call that suspend-to-both. It is planned, but not
> > > > > > >  > really trivial, and I'm a little busy. If someone wants to
> > > > > > >  > help....
> > > > > > >
> > > > > > > I was thinking a few days ago. With your move of all this stuff
> > > > > > > to userspace, if it was done in multiple stages, we could
> > > > > > > implement a form of checkpointing this way.
> > > > > > >
> > > > > > > So instead of doing the 'suspend to disk/ram' after 'write out
> > > > > > > all pages', we just continue.
> > > > > > >
> > > > > > > Why is this useful ?  We've seen bugs reported that only ever
> > > > > > > bite customers after they've run their workload for a month. 
> > > > > > > Now, if they had a means of checkpointing, then when it crashes,
> > > > > > > they could capture the last image that landed somewhere, and set
> > > > > > > that up for more tests/monitoring with kprobes etc and reproduce
> > > > > > > those hard-to-reproduce bugs a lot faster.
> > > > > >
> > > > > > I've been asked about this from time to time too. Apart from the
> > > > > > issues Pavel has already mentioned, the big problem in my mind was
> > > > > > figuring out what to do about disk storage. As the algorithm stands
> > > > > > at the moment, the image includes information about the state of
> > > > > > mounted filesystems. We'd need to somehow get rid of or be able to
> > > > > > ignore that. Any suggestions?
> > > > >
> > > > > Well, copying all the filesystems would work, as would having no
> > > > > filesystems at all :-) [ramdisk case]. And perhaps practical
> > > > > equivalent of "copy all filesystems" can be done with device mapper.
> > > > >
> > > > > [Of course, you'd have to copy all the filesystems back before doing
> > > > > resume].
> > > >
> > > > If we had anything like fs suspend/resume, we could handle such things.
> > > > We could also handle the "USB device mounted before suspend" problem
> > > > (I think it's related).
> > >
> > > Well, we have bdev freezing, which I guess is what is used for fixing up
> > > raid mirrors (but don't know for certain). I use it in refrigerating to
> > > get XFS to really stop activity. I don't think it helps in this case
> > > though:
> >
> > I don't think so too.
> >
> > > We need to be able to rollback the state of the filesystem in memory and
> > > on disk to the point where the last checkpoint was made. Memory would be
> > > straight forward if we want to do it dumbly and slowly - just reload the
> > > whole check pointed image. If we want to be more efficient, we'd want to
> > > just load the pages that had changed (Mark on (first) write?). But
> > > filesystems seem to be a whole different story. Do any of the commonly
> > > used fses have support for checkpointing and rollback back at the moment?
> >
> > I'm not sure if we need a rollback as such.  What we need is to make sure
> > the filesystems state will be consistent before as well as after we have
> > "reloaded" the snapshot.
> 
> Rereading what I think was Dave's original comment above (bug reports that 
> only bite customers...), I think the requirement is to be able to rollback 
> the entire system to the checkpoint - not merely ensure it's consistent, but 
> ensure it's the same so that (all other things being equal), the bug could be 
> reproduced with the extra instrumentation in place. Having a filesystem that 
> was consistent but (say) discarding the inodes and dentries in memory at the 
> time of the checkpoint might be throwing away the very data required to 
> reproduce the bug.

Right, but it still would be useful for tracing bugs that are not related to
filesystems, I think.  Moreover, it would also be useful for other purposes
(the USB devices problem, retrying to resume after fixing some hardware).

Greetings,
Rafael