Re: [PATCH 01/18] Add a dentry op to handle automounting rather than abusing follow_link()

Nick Piggin <npiggin@xxxxxxxxx> · Thu, 13 Jan 2011 14:53:57 +1100

On Wed, Jan 12, 2011 at 4:48 AM, David Howells <dhowells@xxxxxxxxxx> wrote:
> Add a dentry op (d_automount) to handle automounting directories rather than
> abusing the follow_link() inode operation.  The operation is keyed off a new
> inode flag (S_AUTOMOUNT).
>
> This makes it easier to add an AT_ flag to suppress terminal segment automount
> during pathwalk.  It should also remove the need for the kludge code in the
> pathwalk algorithm to handle directories with follow_link() semantics.
>
> A new pathwalk subroutine, follow_automount() is added to handle mountpoints.
> It will return -EREMOTE if the S_AUTOMOUNT was set, but no d_automount() op was
> supplied, -ELOOP if we've encountered too many symlinks or mountpoints, -EISDIR
> if the walk point should be used without mounting and 0 if successful.  path
> will be updated if an automount took place to point to the mounted filesystem.
>
> I've only changed __follow_mount() to handle call follow_automount(), but it
> might be necessary to change follow_mount() too.  The latter is only called
> from follow_dotdot(), but any automounts on ".." should be pinned whilst we're
> using a child of it.
>
> I've also extracted the mount/don't-mount logic from autofs4 and included it
> here.  It makes the mount go ahead anyway if someone calls open() or creat(),
> tries to traverse the directory, tries to chdir/chroot/etc. into the directory,
> or sticks a '/' on the end of the pathname.  If they do a stat(), however,
> they'll only trigger the automount if they didn't also say O_NOFOLLOW.
>
> Signed-off-by: David Howells <dhowells@xxxxxxxxxx>
> Acked-by: Ian Kent <raven@xxxxxxxxxx>

I would still prefer to see a .follow_mount API, and not tie in this automount
specific inode detail to what could be a more flexible dentry-only API.

The default NULL implementation would do nothing, and follow_automount
stuff can be in fs/libfs.c to be used by filesystem, rather than fs/namei.c.

I would rather that the automount call just attach the mount and then go
back to namei.c where it does the mount hash lookup.

That should take like a couple of lines in the path lookup code.

>  locking rules:
>                rename_lock     ->d_lock        may block       rcu-walk
> @@ -29,6 +30,7 @@ d_delete:     no              yes             no              no
>  d_release:     no              no              yes             no
>  d_iput:                no              no              yes             no
>  d_dname:       no              no              no              no
> +d_automount:   no              no              no              yes

> +static void __follow_mount_rcu(struct nameidata *nd, struct path *path,
> +                               struct inode **inode)
> +{
> +       while (d_mountpoint(path->dentry)) {
> +               struct vfsmount *mounted;
> +               mounted = __lookup_mnt(path->mnt, path->dentry, 1);
> +               if (!mounted)
> +                       return;
> +               path->mnt = mounted;
> +               path->dentry = mounted->mnt_root;
> +               nd->seq = read_seqcount_begin(&path->dentry->d_seq);
> +               *inode = path->dentry->d_inode;
> +       }
> +}

Where we have 2 functions, one with _rcu postfix, the _rcu version is used
for rcu-walk lookups, and the other one used for ref-walk lookups.

This means they have to provide the exact same functionality, or where
that isn't possible, the _rcu variant must return -ECHILD.

So something has gone wrong here. You have documented .d_automount
can be called in rcu-walk mode, but it doesn't seem to be the case. And in
the _rcu variant you skip checking automount functionality entirely. So at
least you'd have to check for d_automount_point and bail out if it exists. But
you need to be careful:

> +#define d_automount_point(dentry) \
> +       (dentry->d_inode && IS_AUTOMOUNT(dentry->d_inode))

> @@ -277,6 +278,7 @@ struct inodes_stat_t {
>  #define IS_SWAPFILE(inode)     ((inode)->i_flags & S_SWAPFILE)
>  #define IS_PRIVATE(inode)      ((inode)->i_flags & S_PRIVATE)
>  #define IS_IMA(inode)          ((inode)->i_flags & S_IMA)
> +#define IS_AUTOMOUNT(inode)    ((inode)->i_flags & S_AUTOMOUNT)

This guy will oops in rcu-walk mode, because dentry->d_inode can go
NULL at any time.

If you do want to do this kind of thing in rcu-walk mode, the path walking
code carries the inode around for use (instead of ->d_inode).

If you make sure to have dropped out from rcu-walk mode, then d_inode
can be relied on to be stable, of course.

Thanks,
Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html