Re: Fallthrus as full-length symlinks?

Erez Zadok <ezk@xxxxxxxxxxxxx> · Fri, 13 Nov 2009 13:46:15 -0500

In message <20091113174631.GD19656@shell>, Valerie Aurora writes:
> Fallthrus were invented as a placeholders for readdir() on a
> union-mounted directory - basically, to use the top-level file
> system's readdir() cookie mechanism.  Fallthrus are persistent
> directory entries and are implemented by the underlying file system -
> such as ext2 or tmpfs - in whatever way it sees fit.  We've
> implemented them for ext2 in two ways: as a regular directory entry
> with a magic inode number, and as a regular directory entry with a
> special file type.

Other than a possible improvement to ->rename, what's wrong with the idea of
a special dirent flag?  I kinda liked that idea: it's simple and requires
only a small amount of change to lower file systems.  Any idea in which you
have to record the whiteouts using an actual file or inode is more
cumbersome.

> Recently, David Woodhouse suggested implementing fallthrus as
> full-length symlinks with a special flag.

Where does this "special flag" go?  Is it persistent?  Is it new?  Would
that mean having to change lower file systems to teach them about this flag?

Is there a way of doing it w/o having to change lower f/s code at all?
That'll be a major advantage if possible.

> The interesting thing about
> this idea is that it could theoretically let us rename a file from the
> low level file system to another place in the low-level file system
> without copying the contents of the file up.  Basically, we can
> arbitrarily swizzle the namespace of the low-level by maintaining a
> set of symlinks above.

So now you're proposing to allow something like multiple writeable branches,
in that you allow something other than the topmost branch to be modified.
Moreover, it appears that what you're proposing will need to modify two or
more branches, right?

Maybe I don't understand what this symlink idea is all about.  But I do know
(and have documented it in a TOS article), that ->rename is the most
difficult op to implement in a unioning system.

Getting multiple writeable branches to work reliably in Unionfs had been a
major challenge.  In particular, for rename.  We've first tried to get
rename to be optimal: to rename within the branch a file was in already, or
to minimize modifications across too many layers, to reduce copyup.  But we
found it to be hard to do reliably, esp. when you take into account
whiteouts and opaque directories; even worse when you consider multiple
concurrent renames which could touch a subset of your layers.

See, some unioning operations have to proceed from top to bottom (e.g.,
lookups), while others have to proceed from some middle layer going upwards
(e.g., copyups).  This is a recipe for deadlocks and other races, b/c
there's no clear ordering of operations.  After trying this rename
optimization for several years in unionfs, we gave up on it and just went
with a simpler policy: either you can rename within the same branch, or
copyup the destination into its new name (and optionally whiteout the
source).

I'm not saying it can't be done, but I'm concerned that these symlinks may
add too much complexity to the code.  One would need to think very carefully
about every operation that could affect these symlinks, and how they might
interact with concurrent f/s operations, as well as ops that traverse the
stack in different directions.

Also, aren't you worried about the symlinks cluttering the lower namespace
and consuming inodes and data blocks?  (That has been one of the criticisms
of Unionfs.)

> Is this useful?  Is it implementable?

It's Deja Vu all over again. :-)

> Background reading:
> 
> http://valerieaurora.org/union/
> 
> -VAL

Erez.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html