Re: [RFC] Don't propagate automount

Ian Kent <raven@xxxxxxxxxx> · Fri, 27 Sep 2019 15:09:32 +0800

On Fri, 2019-09-27 at 09:35 +0800, Ian Kent wrote:
> On Thu, 2019-09-26 at 14:52 -0500, Goldwyn Rodrigues wrote:
> > An access to automounted filesystem can deadlock if it is a bind
> > mount on shared mounts. A user program should not deadlock the
> > kernel
> > while automount waits for propagation of the mount. This is
> > explained
> > at https://bugzilla.redhat.com/show_bug.cgi?id=1358887#c10
> > I am not sure completely blocking automount is the best solution,
> > so please reply with what is the best course of action to do
> > in such a situation.
> > 
> > Propagation of dentry with DCACHE_NEED_AUTOMOUNT can lead to
> > propagation of mount points without automount maps and not under
> > automount control. So, do not propagate them.
> 
> Yes, I'm not sure my comments about mount propagation in that
> bug are accurate.
> 
> This behaviour has crept into the kernel in reasonably recent
> times, maybe it's a bug or maybe mount propagation has been
> "fixed", not sure.
> 
> I think I'll need to come up with a more detailed description
> of what is being done for Al to be able to offer advice.
> 
> I'll get to that a bit later.

To duplicate this problem use an autofs indirect map
that uses bind mounts and has offsets:

test	/	:/exports \
	/tmp	:/exports/tmp \
	/lib	:/exports/lib

and add:

/bind	/etc/auto.exports

to /etc/auto.master.

Finally create the bind mount directories:

mkdir -p /exports/lib /exports/tmp

Then, on a broken kernel, eg. 4.13.9-300.fc27:

ls /bind/test

will result in:

/etc/auto.exports on /bind type autofs (rw,relatime,fd=5,pgrp=2981,timeout=300,minproto=5,maxproto=5,indirect,pipe_ino=45485)
/dev/mapper/fedora_f27-root on /bind/test type ext4 (rw,relatime,seclabel,data=ordered)
/etc/auto.exports on /bind/test/lib type autofs (rw,relatime,fd=5,pgrp=2981,timeout=300,minproto=5,maxproto=5,offset,pipe_ino=45485)
/etc/auto.exports on /exports/lib type autofs (rw,relatime,fd=5,pgrp=2981,timeout=300,minproto=5,maxproto=5,offset,pipe_ino=45485)
/etc/auto.exports on /bind/test/tmp type autofs (rw,relatime,fd=5,pgrp=2981,timeout=300,minproto=5,maxproto=5,offset,pipe_ino=45485)
/etc/auto.exports on /exports/tmp type autofs (rw,relatime,fd=5,pgrp=2981,timeout=300,minproto=5,maxproto=5,offset,pipe_ino=45485)

these mount entries, not all of which have been mounted by autofs.

Whereas on a kernel that isn't broken, eg. 4.11.8-300.fc26, the same
ls command will result in:

/etc/auto.exports on /bind type autofs (rw,relatime,fd=6,pgrp=2920,timeout=300,minproto=5,maxproto=5,indirect,pipe_ino=42067)
/etc/auto.exports on /bind/test/lib type autofs (rw,relatime,fd=6,pgrp=2920,timeout=300,minproto=5,maxproto=5,offset,pipe_ino=42067)
/etc/auto.exports on /bind/test/tmp type autofs (rw,relatime,fd=6,pgrp=2920,timeout=300,minproto=5,maxproto=5,offset,pipe_ino=42067)

these mount entries, all of which have been mounted by autofs (and
are what's needed for these offset mount constructs).

If the /bind mount is made propagation slave or private at mount
by automount the problem doesn't happen and that is the workaround
I implemented in autofs.

I initially thought this was the result of a "fix" in the mount
propagation code but it occurred to me that propagation is meant
to occur between mount trees not within them so this might be a
bug.

I probably should have worked out exactly what upstream kernel
this started happening in and then done a bisect and tried to
work out if the change was doing what it was supposed to.

Anyway, I'll need to do that now for us to discuss this sensibly.

> 
> > Signed-off-by: Goldwyn Rodrigues <rgoldwyn@xxxxxxxx>
> > 
> > diff --git a/fs/pnode.c b/fs/pnode.c
> > index 49f6d7ff2139..b960805d7954 100644
> > --- a/fs/pnode.c
> > +++ b/fs/pnode.c
> > @@ -292,6 +292,9 @@ int propagate_mnt(struct mount *dest_mnt,
> > struct
> > mountpoint *dest_mp,
> >  	struct mount *m, *n;
> >  	int ret = 0;
> >  
> > +	if (source_mnt->mnt_mountpoint->d_flags &
> > DCACHE_NEED_AUTOMOUNT)
> > +		return 0;
> > +
> 
> Possible problem with this is it will probably prevent mount
> propagation in both directions which will break stuff.
> 
> I had originally assumed the problem was mount propagation
> back to the parent mount but now I'm not sure that this is
> actually what is meant to happen.
> 
> >  	/*
> >  	 * we don't want to bother passing tons of arguments to
> >  	 * propagate_one(); everything is serialized by namespace_sem,
> >