Oren Laadan wrote: > > > Daniel Lezcano wrote: >> Serge E. Hallyn wrote: >>> Quoting Jamie Lokier (jamie@xxxxxxxxxxxxx): >>> >>>> Matt Helsley wrote: >>>> >>>>>> That said, if the intent is to allow the restore to be done on >>>>>> another node with a "similar" filesystem (e.g. created by rsync/node >>>>>> image), instead of having a coherent distributed filesystem on all >>>>>> of the nodes then the filename makes sense. >>>>>> >>>>> Yes, this is the intent. >>>>> >>>> I would worry about programs which are using files which have been >>>> deleted, renamed, or (very common) renamed-over by another process >>>> after being opened, as there's a good chance they will successfully >>>> open the wrong file after c/r, and corrupt state from then on. >>>> >>> Userspace is expected to back up and restore the filesystem, for >>> instance using a btrfs snapshot or a simple rsync or tar. >>> >>> >> That does not solve the problem Jamie is talking about. >> A rsync or a tar will not see a deleted file and using a btrfs to >> have the CR to work with the deleted files is a bit overkill, no ? > > Let's separate the issues of file system snapshot and deleted files. > > 1) File system snapshot: > ------------------------ > The requirement is to preserve the file system state between the time > of the checkpoint and the time of the restart, because userspace will > expect it to remain the same. > > The alternatives are: > > a) Use capable file system, like brfs, or (modified) nilfs. > > b) Userspace saves the state e.g. w/ tar or rsync (maybe incremental) > > c) Assume/expect that the file system isn't modified between checkpoint > and restart (e.g. if we use c/r to suspend a user's session) > > d) Expect userspace to adapt to changes if they occur, e.g. by having > the application be aware of the possibility, or by providing a wrapper > that will do some magic prior to restart (by looking at the checkpoint > image). > > Options a,b,c are all transparent to the application, while option > d required that applications become aware of c/r. That's ok, but our > primary goal is to be generic enough to unmodified applications. > > 2) Deleted files: > ----------------- > The requirement is that at restart we'll be able to restore the file > point in the kernel to a deleted file with same properties and contents > as it was at the time of the checkpoint. > > The alternatives we considered are: > > e) For each deleted file, save the contents of that file as part of > the checkpoint image; > At restart - create a new file, populate with the contents, open it > (to get an active file pointer), and finally unlink it, so it is - > again - deleted. > > f) At checkpoint time, create a file (from scratch) in a dedicated > area of the file system (userspace configurable?), and copy the > contents of the deleted file to this file. Only save the file system > state after this is done. > At restart, open the alternative file instead, and then immediately > delete it. > > g) At checkpoint time, re-link the file to a dedicated area of the > file system. This requires support from the underlying file system, > of course. For instance, it's trivial for ext2,3 but IIRC will need > help for ext4. Re-linking is essentially attaching a new filename > to an existing inode that is still referenced but is otherwise not > reachable - and make it reachable again. > At restart, open the re-linked file and then immediately delete it. > >> I have another question about the deleted files. How is handled the >> case when a process has a deleted mapped file but without an >> associated file descriptor ? >> > > It works the same as with non-deleted files (assuming that we know > how to handle delete files in general, e.g. options e,d,f above): > > To checkpoint a task's mm we loop through the vma's and checkpoint > them. For a vma that corresponds to a mapped file, we first save > the vma->vm_file. In turn, for a file pointer we save the filename, > properties, credentials. A file pointer is saved as an independent > object - and is assigned a unique id - objref. The state of the vma > will indicate indicate this objref. > > At restart, we will first see the file pointer object, and will > open the file to create a corresponding file pointer. Later when > we restore the vma, we'll locate the (new) file pointer using the > objref and use it in mmap. > > Oren. > Thanks Oren for the detailed answer. _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers