On Monday 02 June 2008, hooanon05@xxxxxxxxxxx wrote: > > * data inconsistency problems when simultaneously accessing the underlying > > fs and the union. > Aufs has three levels of detecting the direct-access to the lower > (branch) filesystems (ie. bypassing aufs). I guess the most strict level > is a good answer for your question. It is based on the inotify > feature. Aufs sets inotify-watch to every accessed directories on lower > fs. During those inodes are cached, aufs receives the inotify event for > thier children/files and marks the aufs data for the file is > obsoleted. When the file is accessed later, aufs retrives the latest > inode (or dentry) again. > The inotify-watch will be removed when the aufs dir inode is discarded > from cache. This is a very complicated approach, and I'm not sure if it even addresses the case where you have a shared mmap on both files. With VFS based union mounts, they share one inode, so you don't need to use idiotify in the first place, and it automatically works on shared mmaps. > > * duplication of dentry and inode data structures in the union wastes > > memory and cpu cycles. > > Aufs has its own dentry and inode object as normal fs has. And they have > pointers to the corresponding ones on the lower fs. If you make a union > from two real filesystems, then aufs inode will have (at most) two > pointers as its private data. > Do you mean having pointers is a duplicataion? I mean having your own dentry and inode object is duplication. The underlying file system already has them, so if you have your own, you need to keep them synchronized. I guess that in order to do a lookup on a file, you need the steps of 1. lookup in aufs dentry cache -> fail 2. lookup in underlying dentry cache -> fail 3. try to read dentry from disk -> fail 4. repeat 2-3 until found, or arrive at lowest level 5. create an inode in memory for the lower file system 6. create dentry in memory on lower file system, pointing to that 7. create an aufs specific inode pointing to the underlying inode 8. create an aufs specific dentry object to point to that 9. create a struct inode representing the aufs inode 10. create another VFS dentry to point to that when you really should just return the dentry found by the lower file system. > > * whiteouts are in the same namespace as regular files, so conflicts are > > possible. > > Yes, that's right. > Aufs reserves ".wh." as a whiteout prefix, and prohibits users to handle > such filename inside aufs. It might be a problem as you wrote, but users > can create/remove them directly on the lower fs and I have never > received request about this reserved prefix. It's not so much a practical limitation as an exploitable feature. E.g. an unpriviledged user may use this to get an application into an error condition by asking for an invalid file name. Posix reserves a well-defined set of invalid file names, and deviation from this means that you are not compliant, and that in a potentially unexpected way. > > * mounting a large number of aufs on top of each other eventually > > overflows the kernel stack, e.g. in readdir. > > Aufs readdir operation consumes memory, but it is not stack. If it was > implemented as a recursive function, it might cause the stack > overflow. But actually it is a loop. > The memory is used for stroing entry names and eliminating whiteout-ed > ones, and the result will be cached for a specified time. So the memory > (other than stack) will be consumed. How does aufs know that one of its branches is an aufs itself? If you detect this, do you fold it into a single aufs instance with more branches? In case you don't do it, I don't see how you get around the stack overflow, but if you do it, you have again added a whole lot of complexity for something that should be trivial when done right. > > * allowing multiple writable branches (instead of just stacking > > one rw copy on a number of ro file systems) is confusing to the user > > and complicates the implementation a lot. > > Probably you are right. Initially aufs had only one policy to select the > writable branch. But several users requested another policy such as > round-robin or most-free-spece, and aufs has implemented them. > I don't guess uers will be confused by these policies. While I tried it > should be simple, I guess some people will say it is complex. I personally think that a policy other than writing to the top is crazy enough, but randomly writing to multiple places is much worse, as it becomes unpredictable what the file system does, not just unexpected. Arnd <>< -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html