On Mon, Nov 10, 2014 at 10:25:40AM +0100, Miklos Szeredi wrote: > On Sun, Nov 9, 2014 at 12:42 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Fri, Nov 07, 2014 at 11:09:59AM -0800, Christoph Hellwig wrote: > >> The overlayfs merge introduces a new rename flag to create to whiteouts. > >> Should be a fairly easy to implement. > >> > >> Miklos, do you have any good documentation and/or test cases for this? > > > > So overlayfs uses some weird char dev hack to implement whiteout > > inodes in directories? Why do we need a whiteout inode on disk? > > what information is actually stored in the whiteout inode that > > overlayfs actually needs? Only readdir and lookup care about > > whiteouts, and AFAICT nothing of the inode is ever used except > > checking the chrdev/whiteoutdev hack via ovl_is_whiteout(dentry). > > > > Indeed, whatever happened to just storing the whiteout in the dirent > > via DT_WHT and using that information on lookup in the lower > > filesystem to mark the dentry returned appropriately without needing > > to lookup a real inode? > > The filesystem is free to implement whiteouts a dirent without an actual inode. Sure, but overlayfs won't make use of it, so we'd have have to hack around overlayfs's ignorance of DT_WHT in several different places to do this. e.g. - in mknod to intercept creation of magical whiteout chardevs - in readdir so we can convert them to DT_CHR so overlayfs can detect them, - in ->lookup so we can create magical chardev inodes in memory rather than try to read them from disk. - in rename we have to detect the magical chardev inodes so we know it's a whiteout we are dealing with This is difficult because overlayfs hard codes the definition of a whiteout into the VFS interface implementation as well as it's internal directory implementation. This leaves almost no room for anyone to optimise the back end implementation because the translation layers are complex and fiddly.... > But we do need at least an inode in the VFS, since the whiteout needs > to be presented to userspace when not part of the overlay. Sure, but that's a different problem. > The DT_WHT > makes the typical mistake of trying to make the implementation nice, > while not caring about user interfaces. You're implying the d_type field in a dirent is something that it is not. d_type has only one purpose in life - to allowing userspace applications to avoid a stat() call to find out the type of the object the dirent points to. > This is usually a big mistake, user interfaces are much more important > than implementation details, and an already existing file type on > which all the usual operations work (stat, unlink) is much better in > this respect than a completely new object which is unknown and > unmanageable for the vast majority of applications. Sure, but again that's not the issue I'm commenting on. The dirent type has no effect on stat, unlink, etc that are done on the dirent after it is returned to userspace. So why is overloading DT_CHR to mean 'either a char device or a whiteout entry' a sane user interface design decision? d_type *was* a simple, obvious, effective and efficient user interface that allowed users to avoid extra syscalls. It's been used this way by userspace for, what, 15 years? With the overlayfs "magic" we now have the situation where d_type is not sufficient to avoid a stat() call to determine the type the dirent points to. IOWs, we've just fucked up a perfectly good interface that is widely used because somebody thought that using the DT_WHT value in the d_type field for whiteout dirents is a "bad interface". > The special chardev was Linus' idea, but I agree with him completely > on this point. Introducing DT_WHT on the userspace API would be much > more of a hack than reusing existing objects and operations. Magical char dev for access, unlink, etc: no problems there. DT_CHR for the whiteout dirent type: completely fucked up. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs