Hello, On Tue, Jul 26, 2011 at 03:59:11PM -0700, Matt Helsley wrote: > > /proc/PID/fd already provides access to deleted files perfectly well > > as most avid p0rn watchers would know (you can run mplayer on flash's > > deleted temp files). ;) > > Yup, access to the unlinked file contents. This is an example where > things appear simple and complete in /proc yet it is insufficient. > Here's what you'll need: > > The string "(deleted)" in a file name is, strictly speaking, ambiguous -- > it does not mean the file is unlinked. You also can't infer that it is > unlinked by stat()'ing that path since a different file could have > been created in the same spot. For something unambiguous you'll > have to add that information to /proc somewhere. fdinfo doesn't seem > to be the right place since fds aren't unlinked -- files are. Hmm... but wouldn't fstat() after open reveal the original inode? ie. $ cat fstat.c #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> #include <stdio.h> #include <fcntl.h> #include <assert.h> int main(int argc, char **argv) { int fd; struct stat st = {}; assert((fd = open(argv[1], O_RDONLY)) >= 0); assert(!fstat(fd, &st)); printf("ino=%lu nlink=%lu\n", (unsigned long)st.st_ino, (unsigned long)st.st_nlink); return 0; } $ gcc -Wall -o fstat fstat.c $ cat > asdf & [7] 31908 $ ./fstat asdf ino=9180912 nlink=1 $ ./fstat /proc/31908/fd/1 ino=9180912 nlink=1 $ rm -f asdf $ ./fstat /proc/31908/fd/1 ino=9180912 nlink=0 $ touch asdf $ $ ./fstat asdf ino=9180915 nlink=1 I don't think anything is ambiguous. > Then you've got to detect when they're the same unlinked file and share > the copy upon restart. Or they could be different unlinked files > with the same path in which case you should not share the copy. I suppose > you'll have to check the device and inode and then see if any other task > being checkpointed has it open... once for each of potentially thousands > of fds being checkpointed. Just build a hash table w/ fstat results. It's O(nr_open_files) whether you do that or not. > Then there's the case where you've got one unlinked dentry for the > file but a hardlink elsewhere. The /proc/PID/fd path won't point to the > hardlinked location. So in order for those to be the same file upon > restart you need to find the file somehow during checkpoint and/or > restart. You can determine whether search for another hardlink is necessary by looking at nlink. Hmm... I wonder whether open_by_handle_at() can be used for this instead of scanning filesystem for matching inode number. Screening by nlink should eliminate most cases but if open_by_handle_at() can deal with actual cases, it would be much better. > Finally these files often can be huge. Copying them elsewhere is a huge IO > burden compared to careful relinking of the file. IO that could be better > spent doing actual work. > > We solved all that with "relinking". It's possible to make a relink() > syscall. The code I posted some time ago to containers@ can be easily > adapted for that -- I did so for my testing of those patches. I'm not > exactly sure how it would be done from userspace but I suspect it could > be done. Yeah, something like flink (like fstat for stat) should do it. FS methods operate on dentries anyway so it can be added in the vfs layer proper if necessary. Thanks. -- tejun _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers