On Sun, Mar 21, 2010 at 09:58:44PM +0100, Daniel Lezcano wrote: > Serge E. Hallyn wrote: > > Quoting Jamie Lokier (jamie@xxxxxxxxxxxxx): > > > >> Matt Helsley wrote: > >> > >>>> That said, if the intent is to allow the restore to be done on > >>>> another node with a "similar" filesystem (e.g. created by rsync/node > >>>> image), instead of having a coherent distributed filesystem on all > >>>> of the nodes then the filename makes sense. > >>>> > >>> Yes, this is the intent. > >>> > >> I would worry about programs which are using files which have been > >> deleted, renamed, or (very common) renamed-over by another process > >> after being opened, as there's a good chance they will successfully > >> open the wrong file after c/r, and corrupt state from then on. > >> > > > > Userspace is expected to back up and restore the filesystem, for > > instance using a btrfs snapshot or a simple rsync or tar. > > > > > That does not solve the problem Jamie is talking about. > A rsync or a tar will not see a deleted file and using a btrfs to have > the CR to work with the deleted files is a bit overkill, no ? These are the same kinds of problems encountered during backup. You can play fast and loose -- like taking a backup while everything is running -- or you can play it conservative and freeze things. I think btrfs snapshots are just one possible solution and it's not overkill. For some filesystems it might make sense to use the filesystem freezer to ensure that no files are deleted while the backup takes place. Combined with tools like rsync or rdiff backup these operations could be low bandwidth and low latency if well-known live-migration techniques are used. Or use dm snapshots. I imagine fanotify could also be useful so long as userspace has marked things correctly prior to checkpoint. My high level understanding of fanotify was we'd be able to delay (or deny) deletion until checkpoint is complete. Or if using fanotify is unacceptable, at the very least we could use inotify to know when a file needed for restart has been deleted. It might go something like: start watching files/dirs needed (fanotify or inotify) Delay/deny changes (fanotify ONLY) freeze tasks for checkpoint freeze filesystem contents: take btrfs snapshots OR take dm snapshots OR use filesystem freezer OR backup filesystem contents sys_checkpoint check for changes to the filesystem contents and report failure if they interfere with restart (inotify ONLY) thaw filesystem contents thaw tasks So there are lots of possible solutions and they don't all involve trying to stop the whole VFS or the whole machine. They also don't require anything more in-kernel than what's already being pushed (our patchset, Eric Paris' patchset for the optional fanotify idea). > I have another question about the deleted files. How is handled the case > when a process has a deleted mapped file but without an associated file > descriptor ? The mapped file holds a struct file reference in the VMA. When checkpoint walks the VMAs the struct file is visited just like for struct files reached from file descriptors. Cheers, -Matt Helsley -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html