Re: Union mount and lockdep design issues

Ian Kent <ikent@xxxxxxxxxx> · Tue, 12 Jul 2011 01:23:53 +0800

On Mon, 2011-07-11 at 18:17 +0200, Michal Suchanek wrote:
> On 11 July 2011 15:50, Ian Kent <ikent@xxxxxxxxxx> wrote:
> > On Mon, 2011-07-11 at 15:36 +0200, Michal Suchanek wrote:
> >> On 11 July 2011 14:00, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >> > On Mon, 2011-07-11 at 12:01 +0100, David Howells wrote:
> >> >> Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >> >>
> >>
> >> >> > Also, why would you want to have a class per sb-instance? From last
> >> >> > talking to David, he said there could only ever be 2 filesystems
> >> >> > involved in this, the top and bottom, and it is determined on (union)
> >> >> > mount time which is which.
> >> >>
> >> >> There can be more than 2 - one upperfs (the actual union) and many lowerfs -
> >> >> though I think only one lowerfs is accessed at a time.
> >> >
> >> > Right, however I understood from our earlier discussion that the vfs
> >> > would only ever try to lock 2 filesystems at a time, the top and one
> >> > lower.
> >>
> >> This is true from local point of view. However, it is technically
> >> possible to use overlayfs as the upper layer of another overlayfs
> >> which allows layering multiple readonly "branches" into a single
> >> overlay. Since the vfs will lock the "union" and one (or possibly
> >> both) of its branches and one of the branches may be itself an union
> >> you can get arbitrary depth (which is currently limited by a constant
> >> in the code to cut recursion depth and stack usage).
> >
> > Off topic but can you elaborate on that?
> >
> > Are you saying the "unioned stack" can consist of more than two file
> > systems and can have more than two layers and possibly a mix of multiple
> > read-only and read-write file systems?
> >
> 
> This is how requirements are described in documentation:
> 
> > The lower filesystem can be any filesystem supported by Linux and does
> > not need to be writable.  The lower filesystem can even be another
> > overlayfs.  The upper filesystem will normally be writable and if it
> > is it must support the creation of trusted.* extended attributes, and
> > must provide valid d_type in readdir responses, at least for symbolic
> > links - so NFS is not suitable.
> 
> In no place it says that the lower filesystem is required to be
> readonly, only that it should not be modified.
> 
> 
> This is what the documentation gives as example:
> 
> > mount -t overlayfs overlayfs -olowerdir=/lower,upperdir=/upper /overlay
> 
> This is how it can be expanded:
> 
> mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/upper /tmpoverlay
> mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/tmpoverlay /overlay

OK, I'll have to think about what this means but I suspect that it is
broken. I'll have a look at the overlayfs code and see if there are
globally enforced ordering of stacked file systems. If there is none
then I believe overlayfs is probably open to AB <-> BA deadlock due to
the possibility of locking two file systems in one overlayfs stack in
one order and the same two file systems in the opposite order in
another.

I don't remember seeing any unioning file system that checks and
enforces this type of global ordering, although I think the special case
checks of union mount pretty much cover it, AFAICT. Its been a while
since I looked at the code for any of the unioning file systems so I may
be wrong.

Assuming I am correct though, that then defines restrictions on what
should (or can) be aloud from a lockdep POV.

Ian

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html