From: Junjiro Okajima <hooanon05@xxxxxxxxxxx> initial commit aufs manual Signed-off-by: Junjiro Okajima <hooanon05@xxxxxxxxxxx> --- Documentation/filesystems/aufs/aufs.5 | 1608 +++++++++++++++++++++++++++++++++ 1 files changed, 1608 insertions(+), 0 deletions(-) diff --git a/Documentation/filesystems/aufs/aufs.5 b/Documentation/filesystems/aufs/aufs.5 new file mode 100644 index 0000000..7335e14 --- /dev/null +++ b/Documentation/filesystems/aufs/aufs.5 @@ -0,0 +1,1608 @@ +.ds AUFS_VERSION 20080516-mm +.ds AUFS_XINO_FNAME .aufs.xino +.ds AUFS_XINO_DEFPATH /tmp/.aufs.xino +.ds AUFS_DIRWH_DEF 3 +.ds AUFS_WH_PFX .wh. +.ds AUFS_WH_PFX_LEN 4 +.ds AUFS_WKQ_NAME aufsd +.ds AUFS_NWKQ_DEF 4 +.ds AUFS_WH_DIROPQ .wh..wh..opq +.ds AUFS_WH_BASENAME .wh.aufs +.ds AUFS_WH_PLINKDIR .wh.plink +.ds AUFS_BRANCH_MAX 127 +.ds AUFS_MFS_SECOND_DEF 30 +.\".so aufs.tmac +. +.eo +.de TQ +.br +.ns +.TP \$1 +.. +.de Bu +.IP \(bu 4 +.. +.ec +.\" end of macro definitions +. +.\" ---------------------------------------------------------------------- +.TH aufs 5 \*[AUFS_VERSION] Linux "Linux Aufs User\[aq]s Manual" +.SH NAME +aufs \- another unionfs. version \*[AUFS_VERSION] + +.\" ---------------------------------------------------------------------- +.SH DESCRIPTION +Aufs is a stackable unification filesystem such as Unionfs, which unifies +several directories and provides a merged single directory. +In the early days, aufs was entirely re-designed and re-implemented +Unionfs Version 1.x series. After +many original ideas, approaches and improvements, it +becomes totally different from Unionfs while keeping the basic features. +See Unionfs Version 1.x series for the basic features. +Recently, Unionfs Version 2.x series begin taking some of same +approaches to aufs\[aq]s. + +.\" ---------------------------------------------------------------------- +.SH MOUNT OPTIONS +At mount-time, the order of interpreting options is, +.RS +.Bu +simple flags, except xino/noxino, udba=inotify and dlgt +.Bu +branches +.Bu +xino/noxino +.Bu +udba=inotify +.Bu +dlgt +.RE + +At remount-time, +the options are interpreted in the given order, +e.g. left to right, except dlgt. The \[oq]dlgt\[cq] option is +disabled in interpreting. +.RS +.Bu +create or remove +whiteout-base(\*[AUFS_WH_PFX]\*[AUFS_WH_BASENAME]) and +whplink-dir(\*[AUFS_WH_PFX]\*[AUFS_WH_PLINKDIR]) if necessary +.Bu +re-enable dlgt if necessary +.RE +. +.TP +.B br:BRANCH[:BRANCH ...] (dirs=BRANCH[:BRANCH ...]) +Adds new branches. +(cf. Branch Syntax). + +Aufs rejects the branch which is an ancestor or a descendant of anther +branch. It is called overlapped. When the branch is loopback-mounted +directory, aufs also checks the source fs-image file of loopback +device. If the source file is a descendant of another branch, it will +be rejected too. + +After mounting aufs or adding a branch, if you move a branch under +another branch and make it descendant of anther branch, aufs will not +work correctly. +. +.TP +.B [ add | ins ]:index:BRANCH +Adds a new branch. +The index begins with 0. +Aufs creates +whiteout-base(\*[AUFS_WH_PFX]\*[AUFS_WH_BASENAME]) and +whplink-dir(\*[AUFS_WH_PFX]\*[AUFS_WH_PLINKDIR]) if necessary. + +If there is the same named file on the lower branch (larger index), +aufs will hide the lower file. +You can only see the highest file. +You will be confused if the added branch has whiteouts (including +diropq), they may or may not hide the lower entries. +.\" It is recommended to make sure that the added branch has no whiteout. + +If a process have once mapped a file by mmap(2) with MAP_SHARED +and the same named file exists on the lower branch, +the process still refers the file on the lower(hidden) +branch after adding the branch. +If you want to update the contents of a process address space after +adding, you need to restart your process or open/mmap the file again. +.\" Usually, such files are executables or shared libraries. +(cf. Branch Syntax). +. +.TP +.B del:dir +Removes a branch. +Aufs does not remove +whiteout-base(\*[AUFS_WH_PFX]\*[AUFS_WH_BASENAME]) and +whplink-dir(\*[AUFS_WH_PFX]\*[AUFS_WH_PLINKDIR]) automatically. +For example, when you add a RO branch which was unified as RW, you +will see whiteout-base or whplink-dir on the added RO branch. + +If a process is referencing the file/directory on the deleting branch +(by open, mmap, current working directory, etc.), aufs will return an +error EBUSY. +. +.TP +.B mod:BRANCH +Modifies the permission flags of the branch. +Aufs creates or removes +whiteout-base(\*[AUFS_WH_PFX]\*[AUFS_WH_BASENAME]) and/or +whplink-dir(\*[AUFS_WH_PFX]\*[AUFS_WH_PLINKDIR]) if necessary. + +If the branch permission is been changing \[oq]rw\[cq] to \[oq]ro\[cq], and a process +is mapping a file by mmap(2) +.\" with MAP_SHARED +on the branch, the process may or may not +be able to modify its mapped memory region after modifying branch +permission flags. +(cf. Branch Syntax). +. +.TP +.B append:BRANCH +equivalent to \[oq]add:(last index + 1):BRANCH\[cq]. +(cf. Branch Syntax). +. +.TP +.B prepend:BRANCH +equivalent to \[oq]add:0:BRANCH.\[cq] +(cf. Branch Syntax). +. +.TP +.B xino=filename +Use external inode number bitmap and translation table. It is set to +<FirstWritableBranch>/\*[AUFS_XINO_FNAME] by default, or +\*[AUFS_XINO_DEFPATH]. +Comma character in filename is not allowed. + +The files are created per an aufs and per a branch filesystem, and +unlinked. So you +cannot find this file, but it exists and is read/written frequently by +aufs. +(cf. External Inode Number Bitmap and Translation Table). +. +.TP +.B noxino +Stop using external inode number bitmap and translation table. + +If you use this option, +Some applications will not work correctly. +.\" And pseudo link feature will not work after the inode cache is +.\" shrunk. +(cf. External Inode Number Bitmap and Translation Table). +. +.TP +.B trunc_xib +Truncate the external inode number bitmap file. The truncation is done +automatically when you delete a branch unless you do not specify +\[oq]notrunc_xib\[cq] option. +(cf. External Inode Number Bitmap and Translation Table). +. +.TP +.B notrunc_xib +Stop truncating the external inode number bitmap file when you delete +a branch. +(cf. External Inode Number Bitmap and Translation Table). +. +.TP +.B create_policy | create=CREATE_POLICY +.TQ +.B copyup_policy | copyup | cpup=COPYUP_POLICY +Policies to select one among multiple writable branches. The default +values are \[oq]create=tdp\[cq] and \[oq]cpup=tdp\[cq]. +link(2) and rename(2) systemcalls have an exception. In aufs, they +try keeping their operations in the branch where the source exists. +(cf. Policies to Select One among Multiple Writable Branches). +. +.TP +.B verbose | v +Print some information. +Currently, it is only busy file (or inode) at deleting a branch. +. +.TP +.B noverbose | quiet | q | silent +Disable \[oq]verbose\[cq] option. +This is default value. +. +.TP +.B dirwh=N +Watermark to remove a dir actually at rmdir(2) and rename(2). + +If the target dir which is being removed or renamed (destination dir) +has a huge number of whiteouts, i.e. the dir is empty logically but +physically, the cost to remove/rename the single +dir may be very high. +It is +required to unlink all of whiteouts internally before issuing +rmdir/rename to the branch. +To reduce the cost of single systemcall, +aufs renames the target dir to a whiteout-ed temporary name and +invokes a pre-created +kernel thread to remove whiteout-ed children and the target dir. +The rmdir/rename systemcall returns just after kicking the thread. + +When the number of whiteout-ed children is less than the value of +dirwh, aufs remove them in a single systemcall instead of passing +another thread. +This value is ignored when the branch is NFS. +The default value is \*[AUFS_DIRWH_DEF]. +. +.TP +.B plink +.TQ +.B noplink +Specifies to use \[oq]pseudo link\[cq] feature or not. +The default is \[oq]plink\[cq] which means use this feature. +(cf. Pseudo Link) +. +.TP +.B clean_plink +Removes all pseudo-links in memory. +In order to make pseudo-link permanent, use +\[oq]auplink\[cq] script just before one of these operations, +unmounting aufs, +using \[oq]ro\[cq] or \[oq]noplink\[cq] mount option, +deleting a branch from aufs, +adding a branch into aufs, +or changing your writable branch as readonly. +If you installed both of /sbin/mount.aufs and /sbin/umount.aufs, and your +mount(8) and umount(8) support them, and /etc/default/auplink is configured, +\[oq]auplink\[cq] script will be executed automatically and flush pseudo-links. +(cf. Pseudo Link) +. +.TP +.B udba=none | reval | inotify +Specifies the level of UDBA (User\[aq]s Direct Branch Access) test. +(cf. User\[aq]s Direct Branch Access and Inotify Limitation). +. +.TP +.B diropq=whiteouted | w | always | a +Specifies whether mkdir(2) and rename(2) dir case make the created directory +\[oq]opaque\[cq] or not. +In other words, to create \[oq]\*[AUFS_WH_DIROPQ]\[cq] under the created or renamed +directory, or not to create. +When you specify diropq=w or diropq=whiteouted, aufs will not create +it if the +directory was not whiteouted or opaqued. If the directory was whiteouted +or opaqued, the created or renamed directory will be opaque. +When you specify diropq=a or diropq==always, aufs will always create +it regardless +the directory was whiteouted/opaqued or not. +The default value is diropq=w, it means not to create when it is unnecessary. +If you define CONFIG_AUFS_COMPAT at aufs compiling time, the default will be +diropq=a. +You need to consider this option if you are planning to add a branch later +since \[oq]diropq\[cq] affects the same named directory on the added branch. +. +.TP +.B warn_perm +.TQ +.B nowarn_perm +Adding a branch, aufs will issue a warning about uid/gid/permission of +the adding branch directory, +when they differ from the existing branch\[aq]s. This difference may or +may not impose a security risk. +If you are sure that there is no problem and want to stop the warning, +use \[oq]nowarn_perm\[cq] option. +The default is \[oq]warn_perm\[cq] (cf. DIAGNOSTICS). +. +.TP +.B coo=none | leaf | all +Specifies copyup-on-open level. +When you open a file which is on readonly branch, aufs opens the file after +copying-up it to the writable branch following this level. +When the keyword \[oq]all\[cq] is specified, aufs copies-up the opening object even if +it is a directory. In this case, simple \[oq]ls\[cq] or \[oq]find\[cq] cause the copyup and +your writable branch will have a lot of empty directories. +When the keyword \[oq]leaf\[cq] is specified, aufs copies-up the opening object except +directory. +The keyword \[oq]none\[cq] disables copyup-on-open. +The default is \[oq]coo=none\[cq]. +. +.TP +.B dlgt +.TQ +.B nodlgt +If you do not want your application to access branches though aufs or +to be traced strictly by task I/O accounting, you can +use the kernel threads in aufs. If you enable CONFIG_AUFS_DLGT and +specify \[oq]dlgt\[cq] mount option, then +aufs delegates its internal +access to the branches to the kernel threads. + +When you define CONFIG_SECURITY and use any type of Linux Security Module +(LSM), for example SUSE AppArmor, you may meet some errors or +warnings from your security module. Because aufs access its branches +internally, your security module may detect, report, or prohibit it. +The behaviour is highly depending upon your security module and its +configuration. +In this case, you can use \[oq]dlgt\[cq] mount option, too. +Your LSM will see the +aufs kernel threads access to the branch, instead of your +application. + +The delegation may damage the performance since it includes +task-switch (scheduling) and waits for the thread to complete the +delegated access. You should consider increasing the number of the +kernel thread specifying the aufs module parameter \[oq]nwkq.\[cq] + +Currently, aufs does NOT delegate it at mount and remount time. +The default is nodlgt which means aufs does not delegate the internal +access. +.\" . +.\" .TP +.\" .B dirperm1 +.\" .TQ +.\" .B nodirperm1 +.\" By default (nodirperm1), aufs checks the permission bits of target +.\" directory on all branches. If any of them refused the requested +.\" access, then aufs returns negative even if the topmost permission bits +.\" of the directory allowed the access. +.\" If you enable CONFIG_AUFS_DLGT and specify \[oq]dirperm1\[cq] option, aufs +.\" doesn\[aq]t check the directories on all lower branches but the topmost +.\" one. +. +.TP +.B shwh +.TQ +.B noshwh +By default (noshwh), aufs doesn\[aq]t show the whiteouts and +they just hide the same named entries in the lower branches. The +whiteout itself also never be appeared. +If you enable CONFIG_AUFS_SHWH and specify \[oq]shwh\[cq] option, aufs +will show you the name of whiteouts +with keeping its feature to hide the lowers. +Honestly speaking, I am rather confused with this \[oq]visible whiteouts.\[cq] +But a user who originally requested this feature wrote a nice how-to +document about this feature. See Tips file in the aufs CVS tree. + +.\" ---------------------------------------------------------------------- +.SH Module Parameters +.TP +.B nwkq=N +The number of kernel thread named \*[AUFS_WKQ_NAME]. + +Those threads stay in the system while the aufs module is loaded, +and handle the special I/O requests from aufs. +The default value is \*[AUFS_NWKQ_DEF]. + +The special I/O requests from aufs include a part of copy-up, lookup, +directory handling, pseudo-link, xino file operations and the +delegated access to branches. +For example, Unix filesystems allow you to rmdir(2) which has no write +permission bit, if its parent directory has write permission bit. In aufs, the +removing directory may or may not have whiteout or \[oq]dir opaque\[cq] mark as its +child. And aufs needs to unlink(2) them before rmdir(2). +Therefore aufs delegates the actual unlink(2) and rmdir(2) to another kernel +thread which has been created already and has a superuser privilege. + +If you enable CONFIG_SYSFS, you can check this value through +<sysfs>/module/aufs/parameters/nwkq. + +So how many threads is enough? You can check it by +<sysfs>/fs/aufs/stat, if you enable CONFIG_AUFS_SYSAUFS (for +linux\-2.6.24 and earlier) or CONFIG_AUFS_STAT (for linux\-2.6.25 and +later) too. +It shows the maximum number of the enqueued work +at a time per a thread. Usually they are all small numbers or +0. If your workload is heavy +and you feel the response is low, then check these values. If there +are no zero and any of them is larger than 2 or 3, you should set \[oq]nwkq\[cq] +module parameter greater then the default value. +But the reason of the bad response is in your branch filesystem, to +increase the number of aufs thread will not help you. + +The last number in <sysfs>/fs/aufs/stat after comma is the maximum +number of the \[oq]no-wait\[cq] enqueued work at a time. Aufs enqueues such +work to the system global workqueue called \[oq]events\[cq], but does not wait +for its completion. Usually they does no harm the time-performance of +aufs. +. +.TP +.B brs=1 | 0 +Specifies to use the branch path data file under sysfs or not. + +If the number of your branches is large or their path is long +and you meet the limitation of mount(8) ro /etc/mtab, you need to +enable CONFIG_SYSFS and set aufs module parameter brs=1. +If your linux version is linux\-2.6.24 and earlier, you need to enable +CONFIG_AUFS_SYSAUFS too. + +When this parameter is set as 1, aufs does not show \[oq]br:\[cq] (or dirs=) +mount option through /proc/mounts, and /sbin/mount.aufs does not put it +to /etc/mtab. So you can keep yourself from the page limitation of +mount(8) or /etc/mtab. +Aufs shows branch paths through <sysfs>/fs/aufs/si_XXX/brNNN. +Actually the file under sysfs has also a size limitation, but I don\[aq]t +think it is harmful. + +The default is brs=0, which means <sysfs>/fs/aufs/si_XXX/brNNN does not exist +and \[oq]br:\[cq] option will appear in /proc/mounts, and /etc/mtab if you +install /sbin/mount.aufs. +If you did not enable CONFIG_AUFS_SYSAUFS (for +linux\-2.6.24 and earlier), this parameter will be +ignored. + +There is one more side effect in setting 1 to this parameter. +If you rename your branch, the branch path written in /etc/mtab will be +obsoleted and the future remount will meet some error due to the +unmatched parameters (Remember that mount(8) may take the options from +/etc/mtab and pass them to the systemcall). +If you set 1, /etc/mtab will not hold the branch path and you will not +meet such trouble. On the other hand, /proc/mounts which holds the +branch path is updated dynamically. So it must not be obsoleted. +. +.TP +.B sysrq=key +Specifies MagicSysRq key for debugging aufs. +You need to enable both of CONFIG_MAGIC_SYSRQ and CONFIG_AUFS_DEBUG. +If your linux version is linux\-2.6.24 and earlier, you need to enable +CONFIG_AUFS_SYSAUFS too. +Currently this is for developers only. +The default is \[oq]a\[cq]. + +.\" ---------------------------------------------------------------------- +.SH Branch Syntax +.TP +.B dir_path[ =permission [ + attribute ] ] +.TQ +.B permission := rw | ro | rr +.TQ +.B attribute := wh | nolwh +dir_path is a directory path. +The keyword after \[oq]dir_path=\[cq] is a +permission flags for that branch. +Comma, colon and the permission flags string (including \[oq]=\[cq])in the path +are not allowed. +Any filesystem can be a branch, but aufs and unionfs. +If you specify aufs or unionfs as a branch, aufs will return an error +saying it is overlapped or nested. +If you enable CONFIG_AUFS_ROBR, you can use aufs as a non-writable +branch of another aufs. + +Cramfs in linux stable release has strange inodes and it makes aufs +confused. For example, +.nf +$ mkdir -p w/d1 w/d2 +$ > w/z1 +$ > w/z2 +$ mkcramfs w cramfs +$ sudo mount -t cramfs -o ro,loop cramfs /mnt +$ find /mnt -ls + 76 1 drwxr-xr-x 1 jro 232 64 Jan 1 1970 /mnt + 1 1 drwxr-xr-x 1 jro 232 0 Jan 1 1970 /mnt/d1 + 1 1 drwxr-xr-x 1 jro 232 0 Jan 1 1970 /mnt/d2 + 1 1 -rw-r--r-- 1 jro 232 0 Jan 1 1970 /mnt/z1 + 1 1 -rw-r--r-- 1 jro 232 0 Jan 1 1970 /mnt/z2 +.fi + +All these two directories and two files have the same inode with one +as their link count. Aufs cannot handle such inode correctly. +Currently, aufs involves a tiny workaround for such inodes. But some +applications may not work correctly since aufs inode number for such +inode will change silently. +If you do not have any empty files, empty directories or special files, +inodes on cramfs will be all fine. + +A branch should not be shared as the writable branch between multiple +aufs. A readonly branch can be shared. + +The maximum number of branches is configurable at compile time. +The current value is \*[AUFS_BRANCH_MAX] which depends upon +configuration. + +When an unknown permission or attribute is given, aufs sets ro to that +branch silently. + +.SS Permission +. +.TP +.B rw +Readable and writable branch. Set as default for the first branch. +If the branch filesystem is mounted as readonly, you cannot set it \[oq]rw.\[cq] +.\" A filesystem which does not support link(2) and i_op\->setattr(), for +.\" example FAT, will not be used as the writable branch. +. +.TP +.B ro +Readonly branch and it has no whiteouts on it. +Set as default for all branches except the first one. Aufs never issue +both of write operation and lookup operation for whiteout to this branch. +. +.TP +.B rr +Real readonly branch, special case of \[oq]ro\[cq], for natively readonly +branch. Assuming the branch is natively readonly, aufs can optimize +some internal operation. For example, if you specify \[oq]udba=inotify\[cq] +option, aufs does not set inotify for the things on rr branch. +Set by default for a branch whose fs-type is either \[oq]iso9660\[cq], +\[oq]cramfs\[cq], \[oq]romfs\[cq] or \[oq]squashfs.\[cq] + +.SS Attribute +. +.TP +.B wh +Readonly branch and it has/might have whiteouts on it. +Aufs never issue write operation to this branch, but lookup for whiteout. +Use this as \[oq]<branch_dir>=ro+wh\[cq]. +. +.TP +.B nolwh +Usually, aufs creates a whiteout as a hardlink on a writable +branch. This attributes prohibits aufs to create the hardlinked +whiteout, including the source file of all hardlinked whiteout +(\*[AUFS_WH_PFX]\*[AUFS_WH_BASENAME].) +If you do not like a hardlink, or your writable branch does not support +link(2), then use this attribute. +But I am afraid a filesystem which does not support link(2) natively +will fail in other place such as copy-up. +Use this as \[oq]<branch_dir>=rw+nolwh\[cq]. +Also you may want to try \[oq]noplink\[cq] mount option, while it is not recommended. + +.\" ---------------------------------------------------------------------- +.SH External Inode Number Bitmap and Translation Table (xino) +Aufs uses one external bitmap file and one external inode number +translation table files per an aufs and per a branch +filesystem by +default. The bitmap is for recycling aufs inode number and the others +are a table for converting an inode number on a branch to +an aufs inode number. The default path +is \[oq]first writable branch\[cq]/\*[AUFS_XINO_FNAME]. +If there is no writable branch, the +default path +will be \*[AUFS_XINO_DEFPATH]. +.\" A user who executes mount(8) needs the privilege to create xino +.\" file. + +Those files are always opened and read/write by aufs frequently. +If your writable branch is on flash memory device, it is recommended +to put xino files on other than flash memory by specifying \[oq]xino=\[cq] +mount option. + +The +maximum file size of the bitmap is, basically, the amount of the +number of all the files on all branches divided by 8 (the number of +bits in a byte). +For example, on a 4KB page size system, if you have 32,768 (or +2,599,968) files in aufs world, +then the maximum file size of the bitmap is 4KB (or 320KB). + +The +maximum file size of the table will +be \[oq]max inode number on the branch x size of an inode number\[cq]. +For example in 32bit environment, + +.nf +$ df -i /branch_fs +/dev/hda14 2599968 203127 2396841 8% /branch_fs +.fi + +and /branch_fs is an branch of the aufs. When the inode number is +assigned contiguously (without \[oq]hole\[cq]), the maximum xino file size for +/branch_fs will be 2,599,968 x 4 bytes = about 10 MB. But it might not be +allocated all of disk blocks. +When the inode number is assigned discontinuously, the maximum size of +xino file will be the largest inode number on a branch x 4 bytes. +Additionally, the file size is limited to LLONG_MAX or the s_maxbytes +in filesystem\[aq]s superblock (s_maxbytes may be smaller than +LLONG_MAX). So the +support-able largest inode number on a branch is less than +2305843009213693950 (LLONG_MAX/4\-1). +This is the current limitation of aufs. +On 64bit environment, this limitation becomes more strict and the +supported largest inode number is less than LLONG_MAX/8\-1. + +The xino files are always hidden, i.e. removed. So you cannot +do \[oq]ls \-l xino_file\[cq]. +If you enable CONFIG_SYSFS, you can check these information through +<sysfs>/fs/aufs/<si_id>/xino (for linux\-2.6.24 and earlier, you +need to enable CONFIG_AUFS_SYSAUFS too). +The first line in <sysfs>/fs/aufs/<si_id>/xino shows the information +of the bitmap file, in the format of, + +.nf +<blocks>x<block size> <file size> +.fi + +Note that a filesystem usually has a +feature called pre-allocation, which means a number of +blocks are allocated automatically, and then deallocated +silently when the filesystem thinks they are unnecessary. +You do not have to be surprised the sudden changes of the number of +blocks, when your filesystem which xino files are placed supports the +pre-allocation feature. + +The rests are hidden xino file information in the format of, + +.nf +<branch index>: <file count>, <blocks>x<block size> <file size> +.fi + +If the file count is larger than 1, it means some of your branches are +on the same filesystem and the xino file is shared by them. +Note that the file size may not be equal to the actual consuming blocks +since xino file is a sparse file, i.e. a hole in a file which does not +consume any disk blocks. + +Once you unmount aufs, the xino files for that aufs are totally gone. +It means that the inode number is not permanent. + +The xino files should be created on the filesystem except NFS. +If your first writable branch is NFS, you will need to specify xino +file path other than NFS. +Also if you are going to remove the branch where xino files exist or +change the branch permission to readonly, you need to use xino option +before del/mod the branch. + +The bitmap file can be truncated. +For example, if you delete a branch which has huge number of files, +many inode numbers will be recycled and the bitmap will be truncated +to smaller size. Aufs does this automatically when a branch is +deleted. +You can truncate it anytime you like if you specify \[oq]trunc_xib\[cq] mount +option. But when the accessed inode number was not deleted, nothing +will be truncated. +If you do not want to truncate it (it may be slow) when you delete a +branch, specify \[oq]notrunc_xib\[cq] after \[oq]del\[cq] mount option. + +If you do not want to use xino, use noxino mount option. Use this +option with care, since the inode number may be changed silently and +unexpectedly anytime. +For example, +rmdir failure, recursive chmod/chown/etc to a large and deep directory +or anything else. +And some applications will not work correctly. +.\" When the inode number has been changed, your system +.\" can be crazy. +If you want to change the xino default path, use xino mount option. + +After you add branches, the persistence of inode number may not be +guaranteed. +At remount time, cached but unused inodes are discarded. +And the newly appeared inode may have different inode number at the +next access time. The inodes in use have the persistent inode number. + +When aufs assigned an inode number to a file, and if you create the +same named file on the upper branch directly, then the next time you +access the file, aufs may assign another inode number to the file even +if you use xino option. +Some applications may treat the file whose inode number has been +changed as totally different file. + +.\" ---------------------------------------------------------------------- +.SH Pseudo Link (hardlink over branches) +Aufs supports \[oq]pseudo link\[cq] which is a logical hard-link over +branches (cf. ln(1) and link(2)). +In other words, a copied-up file by link(2) and a copied-up file which was +hard-linked on a readonly branch filesystem. + +When you have files named fileA and fileB which are +hardlinked on a readonly branch, if you write something into fileA, +aufs copies-up fileA to a writable branch, and write(2) the originally +requested thing to the copied-up fileA. On the writable branch, +fileA is not hardlinked. +But aufs remembers it was hardlinked, and handles fileB as if it existed +on the writable branch, by referencing fileA\[aq]s inode on the writable +branch as fileB\[aq]s inode. + +Once you unmount aufs, the plink info for that aufs kept in memory are totally +gone. +It means that the pseudo-link is not permanent. +If you want to make plink permanent, try \[oq]auplink\[cq] script just before +one of these operations, +unmounting your aufs, +using \[oq]ro\[cq] or \[oq]noplink\[cq] mount option, +deleting a branch from aufs, +adding a branch into aufs, +or changing your writable branch to readonly. + +This script will reproduces all real hardlinks on a writable branch by linking +them, and removes pseudo-link info in memory and temporary link on the +writable branch. +Since this script access your branches directly, you cannot hide them by +\[oq]mount \-\-bind /tmp /branch\[cq] or something. + +If you are willing to rebuild your aufs with the same branches later, you +should use auplink script before you umount your aufs. +If you installed both of /sbin/mount.aufs and /sbin/umount.aufs, and your +mount(8) and umount(8) support them, and /etc/default/auplink is configured, +\[oq]auplink\[cq] script will be executed automatically and flush pseudo-links. + +The /etc/default/auplink is a simple shell script which does nothing but defines +$FLUSH. If your aufs mount point is set in $FLUSH, \[oq]auplink\[cq] flushes +the pseudo-links on that mount point. +If $FLUSH is set to \[oq]ALL\[cq], \[oq]auplink\[cq] will be executed for every aufs. + +The \[oq]auplink\[cq] script uses \[oq]aulchown\[cq] binary, you need to install it too. +The \[oq]auplink\[cq] script executes \[oq]find\[cq] and \[oq]mount \-o remount\[cq], they may take a +long time and impact the later system performance. +If you did not install /sbin/mount.aufs, /sbin/umount.aufs or /sbin/auplink, +but you want to flush pseudo-links, then you need to execute \[oq]auplink\[cq] manually. +If you installed and configured them, but do not want to execute \[oq]auplink\[cq] at +umount time, then use \[oq]\-i\[cq] option for umount(8). + +.nf +# auplink /your/aufs/root flush +# umount /your/aufs/root +or +# auplink /your/aufs/root flush +# mount -o remount,mod:/your/writable/branch=ro /your/aufs/root +or +# auplink /your/aufs/root flush +# mount -o remount,noplink /your/aufs/root +or +# auplink /your/aufs/root flush +# mount -o remount,del:/your/aufs/branch /your/aufs/root +or +# auplink /your/aufs/root flush +# mount -o remount,append:/your/aufs/branch /your/aufs/root +.fi + +The plinks are kept both in memory and on disk. When they consumes too much +resources on your system, you can use the \[oq]auplink\[cq] script at anytime and +throw away the unnecessary pseudo-links in safe. + +Additionally, the \[oq]auplink\[cq] script is very useful for some security reasons. +For example, when you have a directory whose permission flags +are 0700, and a file who is 0644 under the 0700 directory. Usually, +all files under the 0700 directory are private and no one else can see +the file. But when the directory is 0711 and someone else knows the 0644 +filename, he can read the file. + +Basically, aufs pseudo-link feature creates a temporary link under the +directory whose owner is root and the permission flags are 0700. +But when the writable branch is NFS, aufs sets 0711 to the directory. +When the 0644 file is pseudo-linked, the temporary link, of course the +contents of the file is totally equivalent, will be created under the +0711 directory. The filename will be generated by its inode number. +While it is hard to know the generated filename, someone else may try peeping +the temporary pseudo-linked file by his software tool which may try the name +from one to MAX_INT or something. +In this case, the 0644 file will be read unexpectedly. +I am afraid that leaving the temporary pseudo-links can be a security hole. +It makes sense to execute \[oq]auplink /your/aufs/root flush\[cq] +periodically, when your writable branch is NFS. + +When your writable branch is not NFS, or all users are careful enough to set 0600 +to their private files, you do not have to worry about this issue. + +If you do not want this feature, use \[oq]noplink\[cq] mount option and you do +not need +to install \[oq]auplink\[cq] script and \[oq]aulchown\[cq] binary. + +.SS The behaviours of plink and noplink +This sample shows that the \[oq]f_src_linked2\[cq] with \[oq]noplink\[cq] option cannot follow +the link. + +.nf +none on /dev/shm/u type aufs (rw,xino=/dev/shm/rw/.aufs.xino,br:/dev/shm/rw=rw:/dev/shm/ro=ro) +$ ls -li ../r?/f_src_linked* ./f_src_linked* ./copied +ls: ./copied: No such file or directory +15 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked +15 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked2 +22 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ./f_src_linked +22 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ./f_src_linked2 +$ echo abc >> f_src_linked +$ cp f_src_linked copied +$ ls -li ../r?/f_src_linked* ./f_src_linked* ./copied +15 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked +15 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked2 +36 -rw-r--r-- 2 jro jro 6 Dec 22 11:03 ../rw/f_src_linked +53 -rw-r--r-- 1 jro jro 6 Dec 22 11:03 ./copied +22 -rw-r--r-- 2 jro jro 6 Dec 22 11:03 ./f_src_linked +22 -rw-r--r-- 2 jro jro 6 Dec 22 11:03 ./f_src_linked2 +$ cmp copied f_src_linked2 +$ + +none on /dev/shm/u type aufs (rw,xino=/dev/shm/rw/.aufs.xino,noplink,br:/dev/shm/rw=rw:/dev/shm/ro=ro) +$ ls -li ../r?/f_src_linked* ./f_src_linked* ./copied +ls: ./copied: No such file or directory +17 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked +17 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked2 +23 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ./f_src_linked +23 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ./f_src_linked2 +$ echo abc >> f_src_linked +$ cp f_src_linked copied +$ ls -li ../r?/f_src_linked* ./f_src_linked* ./copied +17 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked +17 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked2 +36 -rw-r--r-- 1 jro jro 6 Dec 22 11:03 ../rw/f_src_linked +53 -rw-r--r-- 1 jro jro 6 Dec 22 11:03 ./copied +23 -rw-r--r-- 2 jro jro 6 Dec 22 11:03 ./f_src_linked +23 -rw-r--r-- 2 jro jro 6 Dec 22 11:03 ./f_src_linked2 +$ cmp copied f_src_linked2 +cmp: EOF on f_src_linked2 +$ +.fi + +.\" +.\" If you add/del a branch, or link/unlink the pseudo-linked +.\" file on a branch +.\" directly, aufs cannot keep the correct link count, but the status of +.\" \[oq]pseudo-linked.\[cq] +.\" Those files may or may not keep the file data after you unlink the +.\" file on the branch directly, especially the case of your branch is +.\" NFS. + +If you add a branch which has fileA or fileB, aufs does not follow the +pseudo link. The file on the added branch has no relation to the same +named file(s) on the lower branch(es). +If you use noxino mount option, pseudo link will not work after the +kernel shrinks the inode cache. + +This feature will not work for squashfs before version 3.2 since its +inode is tricky. +When the inode is hardlinked, squashfs inodes has the same inode +number and correct link count, but the inode memory object is +different. Squashfs inodes (before v3.2) are generated for each, even +they are hardlinked. + +.\" ---------------------------------------------------------------------- +.SH User\[aq]s Direct Branch Access (UDBA) +UDBA means a modification to a branch filesystem manually or directly, +e.g. bypassing aufs. +While aufs is designed and implemented to be safe after UDBA, +it can make yourself and your aufs confused. And some information like +aufs inode will be incorrect. +For example, if you rename a file on a branch directly, the file on +aufs may +or may not be accessible through both of old and new name. +Because aufs caches various information about the files on +branches. And the cache still remains after UDBA. + +Aufs has a mount option named \[oq]udba\[cq] which specifies the test level at +access time whether UDBA was happened or not. +. +.TP +.B udba=none +Aufs trusts the dentry and the inode cache on the system, and never +test about UDBA. With this option, aufs runs fastest, but it may show +you incorrect data. +Additionally, if you often modify a branch +directly, aufs will not be able to trace the changes of inodes on the +branch. It can be a cause of wrong behaviour, deadlock or anything else. + +It is recommended to use this option only when you are sure that +nobody access a file on a branch. +It might be difficult for you to achieve real \[oq]no UDBA\[cq] world when you +cannot stop your users doing \[oq]find / \-ls\[cq] or something. +If you really want to forbid all of your users to UDBA, here is a trick +for it. +With this trick, users cannot see the +branches directly and aufs runs with no problem, except \[oq]auplink\[cq] script. +But if you are not familiar with aufs, this trick may make +yourself confused. + +.nf +# d=/tmp/.aufs.hide +# mkdir $d +# for i in $branches_you_want_to_hide +> do +> mount -n --bind $d $i +> done +.fi + +When you unmount the aufs, delete/modify the branch by remount, or you +want to show the hidden branches again, unmount the bound +/tmp/.aufs.hide. + +.nf +# umount -n $branches_you_want_to_unbound +.fi + +If you use FUSE filesystem as an aufs branch which supports hardlink, +you should not set this option, since FUSE makes inode objects for +each hardlinks (at least in linux\-2.6.23). When your FUSE filesystem +maintains them at link/unlinking, it is equivalent +to \[oq]direct branch access\[cq] for aufs. + +. +.TP +.B udba=reval +Aufs tests only the existence of the file which existed. If +the existed file was removed on the branch directly, aufs +discard the cache about the file and +re-lookup it. So the data will be updated. +This test is at minimum level to keep the performance and ensure the +existence of a file. +This is default and aufs runs still fast. + +This rule leads to some unexpected situation, but I hope it is +harmless. Those are totally depends upon cache. Here are just a few +examples. +. +.RS +.Bu +If the file is cached as negative or +not-existed, aufs does not test it. And the file is still handled as +negative after a user created the file on a branch directly. If the +file is not cached, aufs will lookup normally and find the file. +. +.Bu +When the file is cached as positive or existed, and a user created the +same named file directly on the upper branch. Aufs detects the cached +inode of the file is still existing and will show you the old (cached) +file which is on the lower branch. +. +.Bu +When the file is cached as positive or existed, and a user renamed the +file by rename(2) directly. Aufs detects the inode of the file is +still existing. You may or may not see both of the old and new files. +Todo: If aufs also tests the name, we can detect this case. +.RE + +If your outer modification (UDBA) is rare and you can ignore the +temporary and minor differences between virtual aufs world and real +branch filesystem, then try this mount option. +.\" And when you modify a branch directly, set udba=inotify temporary +.\" before the modification and set udba=reval again after that. +. +.TP +.B udba=inotify +Aufs sets \[oq]inotify\[cq] to all the accessed directories on its branches +and receives the event about the dir and its children. It consumes +resources, cpu and memory. And I am afraid that the performance will be +damaged, but it is most strict test level. +There are some limitations of linux inotify, see also Inotify +Limitation. +So it is recommended to leave udba default option usually, and set it +to inotify by remount when you need it. + +When a user accesses the file which was notified UDBA before, the cached data +about the file will be discarded and aufs re-lookup it. So the data will +be updated. +When an error condition occurs between UDBA and aufs operation, aufs +will return an error, including EIO. +To use this option, you need linux\-2.6.18 and later, and need to +enable CONFIG_INOTIFY and CONFIG_AUFS_UDBA_INOTIFY. + +To rename/rmdir a directory on a branch directory may reveal the same named +directory on the lower branch. Aufs tries re-lookuping the renamed +directory and the revealed directory and assigning different inode +number to them. But the inode number including their children can be a +problem. The inode numbers will be changed silently, and +aufs may produce a warning. If you rename a directory repeatedly and +reveal/hide the lower directory, then aufs may confuse their inode +numbers too. It depends upon the system cache. + +When you make a directory in aufs and mount other filesystem on it, +the directory in aufs cannot be removed expectedly because it is a +mount point. But the same named directory on the writable branch can +be removed, if someone wants. It is just an empty directory, instead +of a mount point. +Aufs cannot stop such direct rmdir, but produces a warning about it. + + +.\" ---------------------------------------------------------------------- +.SH Linux Inotify Limitation +Unfortunately, current inotify (linux\-2.6.18) has some limitations, +and aufs must derive it. I am going to address some harmful cases. + +.SS IN_ATTRIB, updating atime +When a file/dir on a branch is accessed directly, the inode atime (access +time, cf. stat(2)) may or may not be updated. In some cases, inotify +does not fire this event. So the aufs inode atime may remain old. + +.SS IN_ATTRIB, updating nlink +When the link count of a file on a branch is incremented by link(2) +directly, +inotify fires IN_CREATE to the parent +directory, but IN_ATTRIB to the file. So the aufs inode nlink may +remain old. + +.SS IN_DELETE, removing file on NFS +When a file on a NFS branch is deleted directly, inotify may or may +not fire +IN_DELETE event. It depends upon the status of dentry +(DCACHE_NFSFS_RENAMED flag). +In this case, the file on aufs seems still exists. Aufs and any user can see +the file. + +.SS IN_IGNORED, deleted rename target +When a file/dir on a branch is unlinked by rename(2) directly, inotify +fires IN_IGNORED which means the inode is deleted. Actually, in some +cases, the inode survives. For example, the rename target is linked or +opened. In this case, inotify watch set by aufs is removed by VFS and +inotify. +And aufs cannot receive the events anymore. So aufs may show you +incorrect data about the file/dir. + +.\" ---------------------------------------------------------------------- +.SH Policies to Select One among Multiple Writable Branches +Aufs has some policies to select one among multiple writable branches +when you are going to write/modify something. There are two kinds of +policies, one is for newly create something and the other is for +internal copy-up. +You can select them by specifying mount option \[oq]create=CREATE_POLICY\[cq] +or \[oq]cpup=COPYUP_POLICY.\[cq] +These policies have no meaning when you have only one writable +branch. If there is some meaning, it must be damaging the performance. + +.SS Exceptions for Policies +In every cases below, even if the policy says that the branch where a +new file should be created is /rw2, the file will be created on /rw1. +. +.Bu +If there is a readonly branch with \[oq]wh\[cq] attribute above the +policy-selected branch and the parent dir is marked as opaque, +or the target (creating) file is whiteouted on the ro+wh branch, then +the policy will be ignored and the target file will be created on the +nearest upper writable branch than the ro+wh branch. +.RS +.nf +/aufs = /rw1 + /ro+wh/diropq + /rw2 +/aufs = /rw1 + /ro+wh/wh.tgt + /rw2 +.fi +.RE +. +.Bu +If there is a writable branch above the policy-selected branch and the +parent dir is marked as opaque or the target file is whiteouted on the +branch, then the policy will be ignored and the target file will be +created on the highest one among the upper writable branches who has +diropq or whiteout. In case of whiteout, aufs removes it as usual. +.RS +.nf +/aufs = /rw1/diropq + /rw2 +/aufs = /rw1/wh.tgt + /rw2 +.fi +.RE +. +.Bu +link(2) and rename(2) systemcalls are exceptions in every policy. +They try selecting the branch where the source exists as possible since +copyup a large file will take long time. If it can\[aq]t be, ie. the +branch where the source exists is readonly, then they will follow the +copyup policy. +. +.Bu +There is an exception for rename(2) when the target exists. +If the rename target exists, aufs compares the index of the branches +where the source and the target are existing and selects the higher +one. If the selected branch is readonly, then aufs follows the copyup +policy. + +.SS Policies for Creating +. +.TP +.B create=tdp | top\-down\-parent +Selects the highest writable branch where the parent dir exists. If +the parent dir does not exist on a writable branch, then the internal +copyup will happen. The policy for this copyup is always \[oq]bottom-up.\[cq] +This is the default policy. +. +.TP +.B create=rr | round\-robin +Selects a writable branch in round robin. When you have two writable +branches and creates 10 new files, 5 files will be created for each +branch. +mkdir(2) systemcall is an exception. When you create 10 new directories, +all are created on the same branch. +. +.TP +.B create=mfs[:second] | most\-free\-space[:second] +Selects a writable branch which has most free space. In order to keep +the performance, you can specify the duration (\[oq]second\[cq]) which makes +aufs hold the index of last selected writable branch until the +specified seconds expires. The first time you create something in aufs +after the specified seconds expired, aufs checks the amount of free +space of all writable branches by internal statfs call +and the held branch index will be updated. +The default value is \*[AUFS_MFS_SECOND_DEF] seconds. + +In this mode, a FUSE branch needs special attention. +The struct fuse_operations has a statfs operation. It is OK, but the +parameter is struct statvfs* instead of struct statfs*. So almost +all user\-space implementaion will call statvfs(3)/fstatvfs(3) instead of +statfs(2)/fstatfs(2). +In glibc, [f]statvfs(3) issues [f]statfs(2), open(2)/read(2) for +/proc/mounts, +and stat(2) for the mountpoint. With this situation, a FUSE branch will +cause a deadlock in creating something in aufs. Here is a sample +scenario, +.\" .RS +.\" .IN -10 +.Bu +create a file just under the aufs root dir. +.Bu +aufs will aquire a write-lock for the parent directory. +.Bu +aufs may call statfs internally for each writable branches to decide the +branch which has most free space. +.Bu +FUSE in kernel\-space converts and redirects the statfs request to the +user\-space. +.Bu +the user-space statfs handler will call [f]statvfs(3). +.Bu +the [f]statvfs(3) in glibc will access /proc/mounts and issue +stat(2) for the mountpoint. But those require a read-lock for the aufs +root directory. +.Bu +Then a deadlock occurs. +.\" .RE 1 +.\" .IN + +In order to avoid this deadlock, I would suggest not to call +[f]statvfs(3). Here is a sample code to do this. +.nf +struct statvfs stvfs; + +main() +{ + [f]statvfs(..., &stvfs) +} + +statfs_handler(const char *path, struct statvfs *arg) +{ + struct statfs stfs; + [f]statfs(..., &stfs); + memcpy(arg, &stvfs, sizeof(stvfs)); + arg->f_bfree = stfs.f_bfree; + arg->f_bavail = stfs.f_bavail; + arg->f_ffree = stfs.f_ffree; + arg->f_favail = /* any value */; +} +.fi +. +.TP +.B create=mfsrr:low[:second] +Selects a writable branch in most-free-space mode first, and then +round-robin mode. If the selected branch has less free space than the +specified value \[oq]low\[cq] in bytes, then aufs re-tries in round-robin mode. +Try an arithmetic expansion of shell which is defined by POSIX. +For example, $((10 * 1024 * 1024)) for 10M. +You can also specify the duration (\[oq]second\[cq]) which is equivalent to +the \[oq]mfs\[cq] mode. +. +.TP +.B create=pmfs[:second] +Selects a writable branch where the parent dir exists, such as tdp +mode. When the parent dir exists on multiple writable branches, aufs +selects the one which has most free space, such as mfs mode. + +.SS Policies for Copy-Up +. +.TP +.B cpup=tdp | top\-down\-parent +Equivalent to the same named policy for create. +This is the default policy. +. +.TP +.B cpup=bup | bottom\-up\-parent +Selects the writable branch where the parent dir exists and the branch +is nearest upper one from the copyup-source. +. +.TP +.B cpup=bu | bottom\-up +Selects the nearest upper writable branch from the copyup-source, +regardless the existence of the parent dir. + +.\" ---------------------------------------------------------------------- +.SH Exporting Aufs via NFS +Aufs is supporting NFS-exporting in linux\-2.6.18 and later. +Since aufs has no actual block device, you need to add NFS \[oq]fsid\[cq] option at +exporting. Refer to the manual of NFS about the detail of this option. + +In linux\-2.6.23 and earlier, +it is recommended to export your branch filesystems once before +exporting aufs. By exporting once, the branch filesystem internal +pointer named find_exported_dentry is initialized. After this +initialization, you may unexport them. +Additionally, this initialization should be done per the +filesystem type. If your branches are all the same filesystem +type, you need to export just one of them once. +If you have never export a filesystem which is used in your +branches, aufs will initialize the internal pointer by the default +value, and produce a +warning. While it will work correctly, I am afraid it will be unsafe +in the future. +In linux\-2.6.24 and later, this exporting is unnecessary. + +Additionally, there are several limitations or requirements. +.RS +.Bu +The version of linux kernel must be linux\-2.6.18 or later. +.Bu +You need to enable CONFIG_AUFS_EXPORT. +.Bu +The branch filesystem must support NFS-exporting. For example, tmpfs in +linux\-2.6.18 (or earlier) does not support it. +.Bu +NFSv2 is not supported. When you mount the exported aufs from your NFS +client, you will need to some NFS options like v3 or nfsvers=v3, +especially if it is nfsroot. +.Bu +If the size of the NFS file handle on your branch filesystem is large, +aufs will +not be able to handle it. The maximum size of NFSv3 file +handle for a filesystem is 64 bytes. Aufs uses 24 bytes for 32bit +system, plus 12 bytes for 64bit system. The rest is a room for a file +handle of a branch filesystem. +.Bu +The External Inode Number Bitmap and Translation Table (xino) is +required since NFS file +handle is based upon inode number. The mount option \[oq]xino\[cq] is enabled +by default. +.Bu +The branch filesystems must be accessible, which means \[oq]not hidden.\[cq] +It means you need to \[oq]mount \-\-move\[cq] when you use initramfs and +switch_root(8), or chroot(8). +.\" .Bu +.\" The \[oq]noplink\[cq] option is recommended, currently. +.\" .Bu +.\" If you add/del branches many times between the accesses to the same file +.\" from the same NFS client, +.\" and the number of the add/del operation is greater than the maximum +.\" number of branches, then aufs may not handle the request from the NFS +.\" client correctly. +.RE + +.\" ---------------------------------------------------------------------- +.SH Dentry and Inode Caches +If you want to clear caches on your system, there are several tricks +for that. If your system ram is low, +try \[oq]find /large/dir \-ls > /dev/null\[cq]. +It will read many inodes and dentries and cache them. Then old caches will be +discarded. +But when you have large ram or you do not have such large +directory, it is not effective. + +If you want to discard cache within a certain filesystem, +try \[oq]mount \-o remount /your/mntpnt\[cq]. Some filesystem may return an error of +EINVAL or something, but VFS discards the unused dentry/inode caches on the +specified filesystem. + +.\" ---------------------------------------------------------------------- +.SH Compatible/Incompatible with Unionfs Version 1.x Series +If you compile aufs with \-DCONFIG_AUFS_COMPAT, dirs= option and =nfsro +branch permission flag are available. They are interpreted as +br: option and =ro flags respectively. + \[oq]debug\[cq], \[oq]delete\[cq], \[oq]imap\[cq] options are ignored silently. When you +compile aufs without \-DCONFIG_AUFS_COMPAT, these three options are +also ignored, but a warning message is issued. + +Ignoring \[oq]delete\[cq] option, and to keep filesystem consistency, aufs tries +writing something to only one branch in a single systemcall. It means +aufs may copyup even if the copyup-src branch is specified as writable. +For example, you have two writable branches and a large regular file +on the lower writable branch. When you issue rename(2) to the file on aufs, +aufs may copyup it to the upper writable branch. +If this behaviour is not what you want, then you should rename(2) it +on the lower branch directly. + +And there is a simple shell +script \[oq]unionctl\[cq] under sample subdirectory, which is compatible with +unionctl(8) in +Unionfs Version 1.x series, except \-\-query action. +This script executes mount(8) with \[oq]remount\[cq] option and uses +add/del/mod aufs mount options. +If you are familiar with Unionfs Version 1.x series and want to use unionctl(8), you can +try this script instead of using mount \-o remount,... directly. +Aufs does not support ioctl(2) interface. +This script is highly depending upon mount(8) in +util\-linux\-2.12p package, and you need to mount /proc to use this script. +If your mount(8) version differs, you can try modifying this +script. It is very easy. +The unionctl script is just for a sample usage of aufs remount +interface. + +Aufs uses the external inode number bitmap and translation table by +default. + +The default branch permission for the first branch is \[oq]rw\[cq], and the +rest is \[oq]ro.\[cq] + +The whiteout is for hiding files on lower branches. Also it is applied +to stop readdir going lower branches. +The latter case is called \[oq]opaque directory.\[cq] Any +whiteout is an empty file, it means whiteout is just an mark. +In the case of hiding lower files, the name of whiteout is +\[oq]\*[AUFS_WH_PFX]<filename>.\[cq] +And in the case of stopping readdir, the name is +\[oq]\*[AUFS_WH_PFX]\*[AUFS_WH_PFX].opq\[cq] or +\[oq]\*[AUFS_WH_PFX]__dir_opaque.\[cq] The name depends upon your compile +configuration +CONFIG_AUFS_COMPAT. +.\" All of newly created or renamed directory will be opaque. +All whiteouts are hardlinked, +including \[oq]<writable branch top dir>/\*[AUFS_WH_PFX]\*[AUFS_WH_BASENAME].\[cq] + +The hardlink on an ordinary (disk based) filesystem does not +consume inode resource newly. But in linux tmpfs, the number of free +inodes will be decremented by link(2). It is recommended to specify +nr_inodes option to your tmpfs if you meet ENOSPC. Use this option +after checking by \[oq]df \-i.\[cq] + +When you rmdir or rename-to the dir who has a number of whiteouts, +aufs rename the dir to the temporary whiteouted-name like +\[oq]\*[AUFS_WH_PFX]<dir>.<random hex>.\[cq] Then remove it after actual operation. +cf. mount option \[oq]dirwh.\[cq] + +.\" ---------------------------------------------------------------------- +.SH Incompatible with an Ordinary Filesystem +stat(2) returns the inode info from the first existence inode among +the branches, except the directory link count. +Aufs computes the directory link count larger than the exact value usually, in +order to keep UNIX filesystem semantics, or in order to shut find(1) mouth up. +The size of a directory may be wrong too, but it has to do no harm. +The timestamp of a directory will not be updated when a file is +created or removed under it, and it was done on a lower branch. + +The test for permission bits has two cases. One is for a directory, +and the other is for a non-directory. In the case of a directory, aufs +checks the permission bits of all existing directories. It means you +need the correct privilege for the directories including the lower +branches. +.\" You can change this behaviour with \[oq]dirperm1\[cq] mount option. +The test for a non-directory is more simple. It checks only the +topmost inode. + +statfs(2) returns the first branch info except namelen. The namelen is +decreased by the whiteout prefix length. + +Remember, seekdir(3) and telldir(3) are not defined in POSIX. They may +not work as you expect. Try rewinddir(3) or re-open the dir. + +The whiteout prefix (\*[AUFS_WH_PFX]) is reserved on all branches. Users should +not handle the filename begins with this prefix. +In order to future whiteout, the maxmum filename length is limited by +the longest value \- \*[AUFS_WH_PFX_LEN]. It may be a violation of POSIX. + +If you dislike the difference between the aufs entries in /etc/mtab +and /proc/mounts, and if you are using mount(8) in util\-linux package, +then try ./mount.aufs script. Copy the script to /sbin/mount.aufs. +This simple script tries updating +/etc/mtab. If you do not care about /etc/mtab, you can ignore this +script. +Remember this script is highly depending upon mount(8) in +util\-linux\-2.12p package, and you need to mount /proc. + +Since aufs uses its own inode and dentry, your system may cache huge +number of inodes and dentries. It can be as twice as all of the files +in your union. +It means that unmounting or remounting readonly at shutdown time may +take a long time, since mount(2) in VFS tries freeing all of the cache +on the target filesystem. +.\" In this case, you had better try \[oq]echo 2 > /proc/sys/vm/drop_caches\[cq] +.\" just before unmounting in shutdown procedure. +.\" It frees unused inodes and dentries quickly. +.\" If your system cache is not so large, you do not need this trick. + +When you open a directory, aufs will open several directories +internally. +It means you may reach the limit of the number of file descriptor. +And when the lower directory cannot be opened, aufs will close all the +opened upper directories and return an error. + +The sub-mount under the branch +of local filesystem +is ignored. +For example, if you have mount another filesystem on +/branch/another/mntpnt, the files under \[oq]mntpnt\[cq] will be ignored by aufs. +It is recommended to mount the sub-mount under the mounted aufs. +For example, + +.nf +# sudo mount /dev/sdaXX /ro_branch +# d=another/mntpnt +# sudo mount /dev/sdbXX /ro_branch/$d +# mkdir -p /rw_branch/$d +# sudo mount -t aufs -o br:/rw_branch:/ro_branch none /aufs +# sudo mount -t aufs -o br:/rw_branch/${d}:/ro_branch/${d} none /aufs/another/$d +.fi + +There are several characters which are not allowed to use in a branch +directory path and xino filename. See detail in Branch Syntax and Mount +Option. + +The file-lock which means fcntl(2) with F_SETLK, F_SETLKW or F_GETLK, flock(2) +and lockf(3), is applied to virtual aufs file only, not to the file on a +branch. It means you can break the lock by accessing a branch directly. +TODO: check \[oq]security\[cq] to hook locks, as inotify does. + +The fsync(2) and fdatasync(2) systemcalls return 0 which means success, even +if the given file descriptor is not opened for writing. +I am afraid this behaviour may violate some standards. Checking the +behaviour of fsync(2) on ext2, aufs decided to return success. + +If you want to use disk-quota, you should set it up to your writable +branch since aufs does not have its own block device. + +When your aufs is the root directory of your system, and your system +tells you some of the filesystem were not unmounted cleanly, try these +procedure when you shutdown your system. +.nf +# mount -no remount,ro / +# for i in $writable_branches +# do mount -no remount,ro $i +# done +.fi +If your xino file is on a hard drive, you also need to specify +\[oq]noxino\[cq] option or \[oq]xino=/your/tmpfs/xino\[cq] at remounting root +directory. + +To rename(2) directory may return EXDEV even if both of src and tgt +are on the same aufs. When the rename-src dir exists on multiple +branches and the lower dir has child(ren), aufs has to copyup all his +children. It can be recursive copyup. Current aufs does not support +such huge copyup operation at one time in kernel space, instead +produces a warning and returns EXDEV. +Generally, mv(1) detects this error and tries mkdir(2) and +rename(2) or copy/unlink recursively. So the result is harmless. +If your application which issues rename(2) for a directory does not +support EXDEV, it will not work on aufs. +Also this specification is applied to the case when the src directroy +exists on the lower readonly branch and it has child(ren). + +.\" ---------------------------------------------------------------------- +.SH EXAMPLES +The mount options are interpreted from left to right at remount-time. +These examples +shows how the options are handled. (assuming /sbin/mount.aufs was +installed) + +.nf +# mount -v -t aufs br:/day0:/base none /u +none on /u type aufs (rw,xino=/day0/.aufs.xino,br:/day0=rw:/base=ro) +# mount -v -o remount,\\ + prepend:/day1,\\ + xino=/day1/xino,\\ + mod:/day0=ro,\\ + del:/day0 \\ + /u +none on /u type aufs (rw,xino=/day1/xino,br:/day1=rw:/base=ro) +.fi + +.nf +# mount -t aufs br:/rw none /u +# mount -o remount,append:/ro /u +different uid/gid/permission, /ro +# mount -o remount,del:/ro /u +# mount -o remount,nowarn_perm,append:/ro /u +# +(there is no warning) +.fi + +.\" If you want to expand your filesystem size, aufs may help you by +.\" adding an writable branch. Since aufs supports multiple writable +.\" branches, the old writable branch can be being writable, if you want. +.\" In this example, any modifications to the files under /ro branch will +.\" be copied-up to /new, but modifications to the files under /rw branch +.\" will not. +.\" And the next example shows the modifications to the files under /rw branch +.\" will be copied-up to /new/a. +.\" +.\" Todo: test multiple writable branches policy. cpup=nearest, cpup=exist_parent. +.\" +.\" .nf +.\" # mount -v -t aufs br:/rw:/ro none /u +.\" none on /u type aufs (rw,xino=/rw/.aufs.xino,br:/rw=rw:/ro=ro) +.\" # mkfs /new +.\" # mount -v -o remount,add:1:/new=rw /u +.\" none on /u type aufs (rw,xino=/rw/.aufs.xino,br:/rw=rw:/new=rw:/ro=ro) +.\" .fi +.\" +.\" .nf +.\" # mount -v -t aufs br:/rw:/ro none /u +.\" none on /u type aufs (rw,xino=/rw/.aufs.xino,br:/rw=rw:/ro=ro) +.\" # mkfs /new +.\" # mkdir /new/a new/b +.\" # mount -v -o remount,add:1:/new/b=rw,prepend:/new/a,mod:/rw=ro /u +.\" none on /u type aufs (rw,xino=/rw/.aufs.xino,br:/new/a=rw:/rw=ro:/new/b=rw:/ro=ro) +.\" .fi + +When you use aufs as root filesystem, it is recommended to consider to +exclude some directories. For example, /tmp and /var/log are not need +to stack in many cases. They do not usually need to copyup or to whiteout. +Also the swapfile on aufs (a regular file, not a block device) is not +supported. + +And there is a good sample which is for network booted diskless machines. See +sample/ in detail. + +.\" ---------------------------------------------------------------------- +.SH DIAGNOSTICS +When you add an branch to your union, aufs may warn you about the +privilege or security of the branch, which is the permission bits, +owner and group of the top directory of the branch. +For example, when your upper writable branch has a world writable top +directory, +a malicious user can create any files on the writable branch directly, +like copyup and modify manually. I am afraid it can be a security +issue. + +When you mount or remount your union without \-o ro common mount option +and without writable branch, aufs will warn you that the first branch +should be writable. + +.\" It is discouraged to set both of \[oq]udba\[cq] and \[oq]noxino\[cq] mount options. In +.\" this case the inode number under aufs will always be changed and may +.\" reach the end of inode number which is a maximum of unsigned long. If +.\" the inode number reaches the end, aufs will return EIO repeatedly. + +When you set udba other than inotify and change something on your +branch filesystem directly, later aufs may detect some mismatches to +its cache. If it is a critical mismatch, aufs returns EIO and issues a +warning saying \[oq]try udba=inotify.\[cq] + +When an error occurs in aufs, aufs prints the kernel message with +\[oq]errno.\[cq] The priority of the message (log level) is ERR or WARNING which +depends upon the message itself. +You can convert the \[oq]errno\[cq] into the error message by perror(3), +strerror(3) or something. +For example, the \[oq]errno\[cq] in the message \[oq]I/O Error, write failed (\-28)\[cq] +is 28 which means ENOSPC or \[oq]No space left on device.\[cq] + +.\" .SH Current Limitation +. +.\" ---------------------------------------------------------------------- +.\" SYNOPSIS +.\" briefly describes the command or function\[aq]s interface. For commands, this +.\" shows the syntax of the command and its arguments (including options); bold- +.\" face is used for as-is text and italics are used to indicate replaceable +.\" arguments. Brackets ([]) surround optional arguments, vertical bars (|) sep- +.\" arate choices, and ellipses (...) can be repeated. For functions, it shows +.\" any required data declarations or #include directives, followed by the func- +.\" tion declaration. +. +.\" DESCRIPTION +.\" gives an explanation of what the command, function, or format does. Discuss +.\" how it interacts with files and standard input, and what it produces on +.\" standard output or standard error. Omit internals and implementation +.\" details unless they\[aq]re critical for understanding the interface. Describe +.\" the usual case; for information on options use the OPTIONS section. If +.\" there is some kind of input grammar or complex set of subcommands, consider +.\" describing them in a separate USAGE section (and just place an overview in +.\" the DESCRIPTION section). +. +.\" RETURN VALUE +.\" gives a list of the values the library routine will return to the caller and +.\" the conditions that cause these values to be returned. +. +.\" EXIT STATUS +.\" lists the possible exit status values or a program and the conditions that +.\" cause these values to be returned. +. +.\" USAGE +.\" describes the grammar of any sublanguage this implements. +. +.\" FILES +.\" lists the files the program or function uses, such as configuration files, +.\" startup files, and files the program directly operates on. Give the full +.\" pathname of these files, and use the installation process to modify the +.\" directory part to match user preferences. For many programs, the default +.\" installation location is in /usr/local, so your base manual page should use +.\" /usr/local as the base. +. +.\" ENVIRONMENT +.\" lists all environment variables that affect your program or function and how +.\" they affect it. +. +.\" SECURITY +.\" discusses security issues and implications. Warn about configurations or +.\" environments that should be avoided, commands that may have security impli- +.\" cations, and so on, especially if they aren\[aq]t obvious. Discussing security +.\" in a separate section isn\[aq]t necessary; if it\[aq]s easier to understand, place +.\" security information in the other sections (such as the DESCRIPTION or USAGE +.\" section). However, please include security information somewhere! +. +.\" CONFORMING TO +.\" describes any standards or conventions this implements. +. +.\" NOTES +.\" provides miscellaneous notes. +. +.\" BUGS +.\" lists limitations, known defects or inconveniences, and other questionable +.\" activities. + +.SH COPYRIGHT +Copyright \(co 2005, 2006, 2007, 2008 Junjiro Okajima + +.SH AUTHOR +Junjiro Okajima + +.\" SEE ALSO +.\" lists related man pages in alphabetical order, possibly followed by other +.\" related pages or documents. Conventionally this is the last section. -- 1.4.4.4 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html