On Tue, May 04, 2010 at 10:12:09PM +0100, Jamie Lokier wrote: > Valerie Aurora wrote: > > +File copyup: Create a file on the top layer that has the same metadata > > +and contents as the file with the same pathname on the bottom layer. > > Can copyup be interrupted? E.g. if I chmod an 80GB file, will the > chmod() system call pause for a couple of hours, or can I control-C it? The right behavior is that you should be able to control-C it, but I doubt that currently works. Let me look into testing and implementing this. > > +This deviation from standard is due to technical limitations of the > > +union mount implementation. Specifically, we would need to replace an > > +open file descriptor from the lower layer with an open file descriptor > > +for a file with matching pathname and contents on the upper layer, > > +which is difficult to do. We avoid this in other system calls by > > +doing the copyup before the file is opened. Unionfs doesn't encounter > > +this problem because it creates a dummy file struct which redirects or > > +fans out operations to the struct files for the underlying file > > +systems. > > + > > +From an application's point of view, the result of an in-kernel file > > +copyup is the logical equivalent of another application updating the > > +file via the rename() pattern: creat() a new file, copy the data over, > > +make changes the copy, and rename() over the old version. Any > > +existing open file descriptors for that file (including those in the > > +same application) refer to a now invisible object that used to have > > +the same pathname. Only opens that occur after the copyup will see > > +updates to the file. > > Does it apply the same permission checks that a program doing > copy+rename would have to pass? I guess that is just write access to > the directory. Yes. > Does it effectively "rename" all hard links referring to the file, to > point to the new version, or does it only affect the path that was > used by the writer/modifier, leaving the other links continue to refer > to the original file? In order to update all the hard links to a file, we would have to walk the entire file system searching for links with a matching inode number and copy them up too. We're never going to do a file-system-wide walk, so we won't do that. The other hard links still point to the old copy of the file. We hope applications don't commonly depend on this. > > + - File copyup on open(O_DIRECT) > > Why is O_DIRECT relevant? O_DIRECT doesn't imply writing, and > copy+rename behaviour is the same with O_DIRECT as not. > > Some programs use O_DIRECT to read very large files, without intending > they will ever be modified. For example, qemu using O_DIRECT to > access a disk image backing file. You're right, this is a mistake. > > +NFS interaction > > +=============== > > + > > +NFS is currently not supported as either type of layer. NFS as > > +read-only layer requires support from the server to honor the > > +read-only guarantee needed for the bottom layer. To do this, the > > +server needs to revoke access to clients requesting read-only file > > +systems if the exported file system is remounted read-write or > > +unmounted (during which arbitrary changes can occur). Some recent > > +discussion: > > + > > +http://markmail.org/message/3mkgnvo4pswxd7lp > > + > > +NFS as the read-write layer would require implementation of the > > +->whiteout() and ->fallthru() methods. DT_WHT directory entries are > > +theoretically already supported. > > + > > +Also, technically the requirement for a readdir() cookie that is > > +stable across reboots comes only from file systems exported via NFSv2: > > + > > +http://oss.oracle.com/pipermail/btrfs-devel/2008-January/000463.html > > + > > +Todo: > > + > > +- Guarantee really really read-only on NFS exports > > +- Implement whiteout()/fallthru() for NFS > > I'm finding it hard to imagine _guaranteeing_ really read-only. All > you can guarantee is that the NFS says it is read-only. > > For example, a userspace NFS server cannot prevent the filesystem it's > serving from changing. We're discussing how to detect this now. > Is this not a problem with other network filesystems like CIFS, P9, FUSE? Each file system that wants to support union mounts will need to implement the features necessary for that layer (hard read-only for the lower layer, whiteouts and fallthrus for the upper layer). > > +Known non-POSIX behaviors > > +------------------------- > > + > > +- Link count may be wrong for files on bottom layer with > 1 link count > > Can you say a bit more about what will be seen? Sure, I'll write up an example. > > +- File copyup is the logical equivalent of an update via copy + > > + rename(). Any existing open file descriptors will continue to refer > > + to the read-only copy on the bottom layer and will not see any > > + changes that occur after the copy-up. > > I can imagine some database-like programs getting confused by that. > > Maybe it would be better to fail copyup operations when the file is > currently open O_RDONLY by anyone, analogous to the way writable > mounts are refused when any union holds it read-only? > > Are there uses likely to be broken by that behaviour? That's an interesting question. In general, this seems like a bad idea - any process can prevent another process from writing to a file by opening it. This is like chmod'ing it to 444. -VAL -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html