[RFC 14/26] union-mount: Documentation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Add simple documentation about union mounting in general and this
implementation in specific.

Signed-off-by: Jan Blunck <jblunck@xxxxxxx>
---
 Documentation/filesystems/union-mounts.txt |  172 +++++++++++++++++++++++++++++
 1 file changed, 172 insertions(+)

--- /dev/null
+++ b/Documentation/filesystems/union-mounts.txt
@@ -0,0 +1,172 @@
+VFS based Union Mounts
+----------------------
+
+ 1. What are "Union Mounts"
+ 2. The Union Stack
+ 3. The White-out Filetype
+ 4. Renaming Unions
+ 5. Directory Reading
+ 6. Known Problems
+ 7. References
+
+-------------------------------------------------------------------------------
+
+1. What are "Union Mounts"
+==========================
+
+Please note: this is NOT about UnionFS and it is NOT derived work!
+
+Traditionally the mount operation is opaque, which means that the content of
+the mount point, the directory where the file system is mounted on, is hidden
+by the content of the mounted file system's root directory until the file
+system is unmounted again. Unlike the traditional UNIX mount mechanism, that
+hides the contents of the mount point, a union mount presents a view as if
+both filesystems are merged together. Although only the topmost layer of the
+mount stack can be altered, it appears as if transparent file system mounts
+allow any file to be created, modified or deleted.
+
+Most people know the concepts and features of union mounts from other
+operating systems like Sun's Translucent Filesystem, Plan9 or BSD.
+
+Here are the key features of this implementation:
+- completely VFS based
+- does not change the namespace stacking
+- directory listings have duplicate entries removed
+- writable unions: only the topmost file system layer may be writable
+- writable unions: new white-out filetype handled inside the kernel
+
+-------------------------------------------------------------------------------
+
+2. The Union Stack
+==================
+
+The mounted file systems are organized in the "file system hierarchy" (tree of
+vfsmount structures), which keeps track about the stacking of file systems
+upon each other. The per-directory view on the file system hierarchy is called
+"mount stack" and reflects the order of file systems, which are mounted on a
+specific directory.
+
+Union mounts present a single unified view of the contents of two or more file
+systems as if they are merged together. Since the information which file
+system objects are part of a unified view is not directly available from the
+file system hierachy there is a need for a new structure. The file system
+objects, which are part of a unified view are ordered in a so-called "union
+stack". Only directoties can be part of a unified view.
+
+The link between two layers of the union stack is maintained using the
+union_mount structure (#include <linux/union.h>):
+
+struct union_mount {
+       atomic_t u_count;               /* reference count */
+       struct mutex u_mutex;
+       struct list_head u_unions;      /* list head for d_unions */
+       struct hlist_node u_hash;       /* list head for seaching */
+       struct hlist_node u_rhash;      /* list head for reverse seaching */
+
+       struct path u_this;             /* this is me */
+       struct path u_next;             /* this is what I overlay */
+};
+
+The union_mount structure holds a reference (dget,mntget) to the next lower
+layer of the union stack. Since a dentry can be part of multiple unions
+(e.g. with bind mounts) they are tied together via the d_unions field of the
+dentry structure.
+
+All union_mount structures are cached in two hash tables, one for lookups of
+the next lower layer of the union stack and one for reverse lookups of the
+next upper layer of the union stack. The reverse lookup is necessary to
+resolve CWD relative path lookups. For calculation of the hash value, the
+(dentry,vfsmount) pair is used. The u_this field is used for the hash table
+which is used in forward lookups and the u_next field for the reverse lookups.
+
+During every new mount (or mount propagation), a new union_mount structure is
+allocated. A reference to the mountpoint's vfsmount and dentry is taken and
+stored in the u_next field.  In almost the same manner an union_mount
+structure is created during the first time lookup of a directory within a
+union mount point. In this case the lookup proceeds to all lower layers of the
+union. Therefore the complete union stack is constructed during lookups.
+
+The union_mount structures of a dentry are destroyed when the dentry itself is
+destroyed. Therefore the dentry cache is indirectly driving the union_mount
+cache like this is done for inodes too. Please note that lower layer
+union_mount structures are kept in memory until the topmost dentry is
+destroyed.
+
+-------------------------------------------------------------------------------
+
+3. Writable Unions: The White-out Filetype and Copy-On-Open
+===========================================================
+
+The white-out filetype isn't new. It has been there for quite some time now
+but Linux's VFS hasn't used it yet. With the availability of union mount code
+inside the VFS the white-out filetype is getting important to support writable
+union mounts. For read-only union mounts support neither white-outs nor
+copy-on-open is necessary.
+
+The white-out filetype has the same function as negative dentries: they
+describe a filename which isn't there. The creation of white-outs needs
+lowlevel filesystem support. At the time of writing this, there is white-out
+support for tmpfs, ext2 and ext3 available. The VFS is extended to make the
+white-out handling transparent to all its users. The white-outs are not
+visible by the user-space.
+
+-------------------------------------------------------------------------------
+
+4. Renaming Unions
+==================
+
+Rename on union mounts has been handled in a lazy way: it returned -EXDEV.
+This works well for dirctories but not for regular files. Even a kernel build
+doesn't handle rename errors appropriate. Therefore when renaming regular
+files from a lower layer of the union stack it is copied to the topmost
+layer. If the file already resides on the topmost layer, the traditional
+rename method is used.
+
+-------------------------------------------------------------------------------
+
+5. Directory Reading
+====================
+
+As mentioned, union mounts represent a single view of multiple directories as
+if they are merged together. This is achieved by reading the contents of every
+directory on the union stack and by merging the result. When the directory
+listing is read via readdir() or getdents() system call, the union stack is
+traversed from the topmost layer of the union stack to the lowermost.
+
+Likewise with regular files, directories are seekable and the position of the
+following read is marked by the file position filp->f_pos. When reading from
+multiple directories, it is possible that the file position exceeds the inode
+size of the first directory. Therefore the file position is rearranged to
+select the correct directory in the union stack. This is done by substractiong
+the inode size if the file position exceeds it and selecting the next member
+of the union stack next.
+
+This worked well with filesystems like ext2 that used flat file directories.
+The directory entry offsets are arranged linear and are always smaller than
+the inode size of the directory. Modern filesystems have implemented
+directories differently and just return special cookies as directory entry
+offsets which are unrelated to the position in the directory or the inode
+size.
+
+-------------------------------------------------------------------------------
+
+6. Known Problems
+=================
+
+- currently it doesn't support seeking/readdir when d_off > i_size is possible
+- readdir() is a file operation
+- copyup() for other filetypes that reg and dir (e.g. for chown() on devices)
+
+-------------------------------------------------------------------------------
+
+7. References
+=============
+
+[1] http://marc.info/?l=linux-fsdevel&m=96035682927821&w=2
+[2] http://marc.info/?l=linux-fsdevel&m=117681527820133&w=2
+[3] http://marc.info/?l=linux-fsdevel&m=117913503200362&w=2
+[4] http://marc.info/?l=linux-fsdevel&m=118231827024394&w=2
+
+Authors:
+Jan Blunck <jblunck@xxxxxxx>
+Bharata B Rao <bharata@xxxxxxxxxxxxxxxxxx>

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux