Re: UnionMount status?

Michal Suchanek <hramrach@xxxxxxxxxx> · Fri, 19 Mar 2010 22:47:15 +0100

Hello

On 19 March 2010 19:03, Valerie Aurora <vaurora@xxxxxxxxxx> wrote:

>
> Where union mounts is right now is in need of more review from VFS
> experts (and thanks to those who have already reviewed it).  I'm

I don't count myself among VFS experts so I'm sorry if I am restating
or missing something obvious.

> rewriting the in-file copyup code right now, which is dependent on a
> lot of ongoing VFS work by Al Viro, Nick Piggin, Dmitriy Monakhov, and
> others.  Here's my description of the problem I'm currently working,
> which is where I could use review the most:
>
> http://groups.google.com/group/linux.kernel/msg/217ca5aedbd7bfd0
>

On Mar 16, 7:20 pm, Valerie Aurora <vaur...@xxxxxxxxxx> wrote:
> This patch shows the basic data flow necessary to implement efficient
> union mount file copyup (without the actual file copyup code).  I need
> input from other VFS people on design, especially since namei.c is
> getting some much needed reorganization.  Some background:
>
> In union mounts, a file on the lower, read-only file system layer is
> copied up to the top layer when an application attempts to write to
> it.  This is in-kernel file copyup.  Some constraints make this
> difficult:
>
> 1. Don't copyup if the operation would fail (e.g., open(O_WRONLY) on a
> file with mode 444).  It's inefficient and a possible security hole to
> copy up a file if no write (or maybe even read) can occur anyway.

The open fails in that case anyway so I see no reason to copy
anything. Why would you copy before you open?

On the other hand, when the open succeeds there is nothing stopping
the writes from happening save things like hardware failure or lack of
disk space. It's appropriate to create an empty inode in this case.
Did you consider creating the files as sparse and handling holes by
looking into the lower layer before /dev/zero? But then you would
perhaps need a flag that differentiates them from real sparse files.

Actually the file has to be copied even when it is open for reading
because if somebody writes it later the readonly bottom handle would
never receive the top updates.

>
> 2. Near-zero added cost for operations on non-union paths in a
> CONFIG_UNION_MOUNT kernel.  One or two flag tests is okay, but
> grabbing a lock or doing a lookup operation to find out if a file
> should be copied up is too expensive.

As I see it any file can be a part of union mount in two ways:

1) You are just looking it up so you do pay the lookup costs and you
will know where it is and if that is part of a union.

2) it's already open so you have  a handle to the file and this handle
can be tagged in some way. It is probably associated with the
originating filesystem anyway so  you just need to look that way.

>
> 3. A file should be copied up if the path (mnt + dentry) of its parent
> is from a union top layer (MNT_UNION set in mnt->mnt_flags).  We can't
> tell if a file can be copied up by looking at its inode, dentry, or
> even path, we have to know what its parent path is.

Let's see what the lookup may be like:

0) No lookup: The file is open. It's always top, no issue here.

1) Perform lookup:
  - if the path goes through non-union directory structure nothing
interesting is going on
  - if the path is currently being looked up inside an union mount it
can happen that a bottom directory exists without a corresponding top
directory. In this case the top directory has to be created. Otherwise
users who have the bottom directory open would not see new files added
(and removed) in the top layer.
  - the fact that lookup is (or is not) going on inside an union mount
is a single boolean flag which may change only on mountpoint
boundaries.

During the lookup it's easy to determine if sufficient permissions
exist to open a file. Directory permissions are recorded on the top
directories or are copied from the bottom directories to the top
directories created during lookup. Additionally files which are only
in the bottom layer can have permissions in there. If the file can be
opened then the file can be copied and must be copied on open.

I guess that the union directories are a bit special in that they are
top directories but have continuation in the bottom directory. Still
this should not concern the user as they should not get the bottom
directory handle nor the top directory handle but a union directory
handle.

On Wed, Mar 17, 2010 at 07:51:31AM +0900, J. R. Okajima wrote:

> Valerie Aurora:
> > 1. Don't copyup if the operation would fail (e.g., open(O_WRONLY) on a
> > file with mode 444).  It's inefficient and a possible security hole to
> > copy up a file if no write (or maybe even read) can occur anyway.

> Just a question.
> How about this case?
> When the file is writable (0644 or something) but its parent directory
> is readonly (0555), do you think the file should be copied-up?

It must appear to be copied up in one way or another.
If the mode can be changed and the FS code does not have some giant
lock that allows only one open handle at a time it cannot rely on the
directory not changing just because it does not appear to have any
write permissions at the time.

Actually in a very primitive filesystem that locks the whole directory
it might hold that the directory cannot be changed while its mode says
it is readonly but it would still have to be quite non-reentrant for
that as the directory mode and contents should be quite separate.

J. R. Okajima:
> Valerie Aurora:
> > I think what people will expect is that we copy up in that case.  I
> > can think of ways this can go wrong, but perhaps that should be an
> > explicit requirement on the top-layer file system, that it can handle
> > create/unlink() in a directory without write permission.

> I am not sure such requirement is the right way.
> How about delegating open() to keventd or some other workqueue which
> will succeed to create files under a directory without write permission?
> Of course, we should handle some error cases after creating a file.

This will fail anyway when the filesystem is actually readonly. I
guess you need the equivalent of ramfs/tmpfs to store the directory
structure. Although if you plan on supporting more than two layers in
the future you may require the top layer to be writable and still
achieve reasonable flexibility.

Thanks

Michal
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html