[UNIONFS] 00/42 Unionfs and related patches review

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Al, Christoph, and Andrew,

As per your request, I'm posting for review the unionfs code (and related
code) that's in my korg tree against mainline (v2.6.24-rc4-190-g94545ba).
This code is nearly identical to what's in -mm (the mm code has a couple of
additional things that depend on mm-specific patches that aren't in mainline
yet).

I really tried to keep this message short, by offering pointers to more
info, but still there's a bunch of info here.

Andrew, you've asked me to list the main issues that came about in
discussions regarding unionfs, and how were they addressed.  So I've
reviewed my notes from OLS'06, LSF'07, and OLS'07, as well as assorted
postings in mailing lists, and I came up with this prioritized list (in
descending priority order):

1. cache coherency
2. nameidata handling
3. namespace pollution
4. use of ioctls for branch management

(1) Cache coherency: by far, the biggest concern had been around cache
    coherency: what happens if someone modifies a lower object
    (file/dir/etc.).  I met with Mike Halcrow in October and we discussed
    stacking in general; Mike also emphasized that cache-coherency was one
    of his most pressing concerns in ecryptfs.

At OLS'06, several suggestions were made, including fancy tricks to hide the
lower namespace or "lock" it so users have readonly access.  None of these
solutions would have been able to easily handle the problem of an existing
open file descriptor on a lower file, and they might have required
significant VFS changes.  Moreover, unionfs users actually want to modify
lower branches directly, and then be able to see their changes reflected in
the union immediately.  So we explored a number of ideas.  We feel that the
VFS is complex enough so we tried our best to handle cache-coherency inside
unionfs.  The solution we have implemented is to compare the mtime/ctime of
upper/lower objects during revalidation (esp. of dentries); and if the lower
times are newer, we reconstruct the union object (drop the older objects,
and re-lookup them).  This time-based cache-coherency works well and is
similar to the NFS model.  Because Unionfs users tend to have a burst of
activity on lower branches, our current cache-coherency also defers the
revalidation actions until absolutely needed, so this idea tends to also be
more efficient for the common usage patterns.  More details about how we
handle cache-coherency are available in our
Documentation/filesystems/unionfs/concepts.txt file.

That said, we're now developing some VFS patches that would allow lower file
systems to more directly inform the upper objects about such (mtime)
changes.  We're exploring a couple of different options but our key goals
are to (a) minimize VFS changes and (b) avoid any changes to lower file
systems.

(2) nameidata handling.  Another important question raised (esp. by NFS
    people) was how we handle struct nameidata.  The VFS passes nameidata
    structs to file systems, and some file systems use that.  We used to
    either pass NULL or the upper nd to the lower f/s.  That caused NULL
    de-refs inside nfsv4, among other problems.  We now create our own
    nameidata structure, fill it up as needed (esp. for intent data), and
    pass it down.  We do this every time we call any VFS function that takes
    a nameidata (e.g., vfs_create).  This seems to work well.

There's been some discussion on lkml about splitting struct nameidata in
two, one of which would handle just the intent information.  I'd like to see
that happen, maybe even help, because right now we pass a whole large-ish
struct nameidata for just a couple of intent bits of information that the
lower f/s needs.

(3) namespace pollution.  Unioning readonly and readwrite directories
    requires the ability to mask, or white-out, files that are being deleted
    from a readonly directory.  Unionfs does this in a portable way, by
    creating .wh.XXX files to indicate that file XXX has been whited-out.
    This works well on many file systems, but it tends to clutter lower
    branches with these .wh.* files.  We recently optimized our whiteout
    creation algorithm so it minimizes the number of conditions in which
    whiteouts are created, and that helped some people a lot.  But still, if
    you unify a readonly and writeable branch, and you try to delete a file
    from the readonly branch/medium, there's no way to avoid creating some
    sort of a whiteout.  BTW, of course, these whiteouts are completely
    hidden from the view of the user who accesses files/dirs via the union.

In the long run, we really hope to see native whiteout support in Linux (ala
BSD).  Of course, this would require a change to the VFS and several native
file systems (possibly even a change to the on-disk format), so we realize
that this isn't likely to happen soon.  If/when native whiteout support was
available, unionfs could easily use it.  Until that time, we have lots of
users who want to use unionfs on top of numerous different file systems, and
so we have to do the next best thing wrt whiteouts.

This is a good point to mention that the version of unionfs in -mm is 2.1.x.
We have been working on a newer and still experimental version of Unionfs,
called "Unionfs with On-Disk Format" or Unionfs-ODF.  Unionfs-ODF uses a
small persistent store (e.g., a small ext2 partition) to store whiteouts in,
among other info; this moves the union-level meta-data (e.g., whiteouts),
outside the lower file systems, and thus eliminates the need to create .wh.*
files.  Unionfs-ODF has other useful benefits, and you can get more detail
about it here: <http://www.filesystems.org/unionfs-odf.txt>.  We recently
sync'ed up our unionfs 2.1 and unionfs-odf releases and we're tracking
Linus's tree for both.  IOW, every fix and user-visible feature that has
gone into unionfs in -mm, is now also in unionfs-odf.  Our intent is to
continue to develop both versions, and gradually move features from
unionfs-odf into unionfs 2.1; this would be possible even if/after
unionfs-2.1 gets merged, because the changes will all be internal to the
implementation, and users won't need to change the way they, say, mount a
union or manipulate its branches.

(4) branch management.  One of the most useful features of unioning is to be
    able to add/remove branches from the union.  We used to do this via
    ioctl's, which was considered racy, unclean, and non-atomic (only one
    branch-manipulation operation at a time).  We now do that via the
    remount interface, and allow users to pass multiple branch-manipulation
    commands, which are handled as one action.


* GENERAL

I should note that my philosophy in developing any stackable file system had
been to minimize changes to the VFS, and to not change any lower file system
whatsoever: that ensures that unionfs couldn't affect the stability of
performance of the rest of the kernel.  Still, some of the things unionfs
does could possibly be done more cleanly and easily at the VFS level (e.g.,
better hooks for cache coherency).

Unionfs 2.1.x is currently maintained on 2.6.9 and all major kernels since
2.6.18, all the way to Linus's latest 2.6.24-rc tree and -mm.  We've got a
lot users who use unionfs in more creative ways than even we could think of,
and this has helped us find the RIGHT set of features to please the users,
as well as stabilize the code.  Before every new release, we test the new
code on all versions using ltp-full, parallel compiles, and our own
unionfs-aware regression suite which exercises unionfs's unique features
(e.g., copy-up).

I therefore believe that unionfs is in a good enough shape now to be
considered for merging in 2.6.25.  The user-visible behavior isn't likely to
change; and any changes to the VFS to better support stacking, could be
handled internally in subsequent kernels without affecting how users use
unionfs.  Aside from greater exposure to stackable file systems and unionfs,
I think one of the other important benefits of a merge could be that we'd
have more than one stackable f/s in the kernel (i.e., ecryptfs and unionfs);
this would allow us to slowly and gradually generalize the VFS so it can
better support stackable file systems.


Lastly, diffstats:

 Documentation/filesystems/00-INDEX             |    2 
 Documentation/filesystems/unionfs/00-INDEX     |   10 
 Documentation/filesystems/unionfs/concepts.txt |  199 ++++
 Documentation/filesystems/unionfs/issues.txt   |   24 
 Documentation/filesystems/unionfs/rename.txt   |   31 
 Documentation/filesystems/unionfs/usage.txt    |  115 ++
 MAINTAINERS                                    |    9 
 fs/Kconfig                                     |   53 -
 fs/Makefile                                    |    1 
 fs/drop_caches.c                               |    4 
 fs/ecryptfs/dentry.c                           |    2 
 fs/ecryptfs/inode.c                            |    6 
 fs/ecryptfs/main.c                             |    2 
 fs/namei.c                                     |    1 
 fs/stack.c                                     |   30 
 fs/unionfs/Makefile                            |   13 
 fs/unionfs/commonfops.c                        |  827 +++++++++++++++++
 fs/unionfs/copyup.c                            |  897 +++++++++++++++++++
 fs/unionfs/debug.c                             |  532 +++++++++++
 fs/unionfs/dentry.c                            |  498 ++++++++++
 fs/unionfs/dirfops.c                           |  290 ++++++
 fs/unionfs/dirhelper.c                         |  272 +++++
 fs/unionfs/fanout.h                            |  355 +++++++
 fs/unionfs/file.c                              |  227 ++++
 fs/unionfs/inode.c                             | 1154 +++++++++++++++++++++++++
 fs/unionfs/lookup.c                            |  652 ++++++++++++++
 fs/unionfs/main.c                              |  783 ++++++++++++++++
 fs/unionfs/mmap.c                              |  338 +++++++
 fs/unionfs/rdstate.c                           |  285 ++++++
 fs/unionfs/rename.c                            |  533 +++++++++++
 fs/unionfs/sioq.c                              |  119 ++
 fs/unionfs/sioq.h                              |   92 +
 fs/unionfs/subr.c                              |  242 +++++
 fs/unionfs/super.c                             | 1020 ++++++++++++++++++++++
 fs/unionfs/union.h                             |  591 ++++++++++++
 fs/unionfs/unlink.c                            |  236 +++++
 fs/unionfs/xattr.c                             |  153 +++
 include/linux/fs_stack.h                       |   21 
 include/linux/magic.h                          |    2 
 include/linux/mm.h                             |    2 
 include/linux/namei.h                          |   13 
 include/linux/union_fs.h                       |   24 
 42 files changed, 10624 insertions(+), 36 deletions(-)


Thanks,
Erez.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux