Re: [RFC] Don't propagate automount

Ian Kent <raven@xxxxxxxxxx> · Wed, 02 Oct 2019 10:14:09 +0800

On Tue, 2019-10-01 at 14:09 -0500, Goldwyn Rodrigues wrote:
> Hi Ian,
> 
> Sorry for the late reply, I had to setup and environment for your
> specific case and it took time.

np.

> 
> On  9:47 28/09, Ian Kent wrote:
> > On Fri, 2019-09-27 at 11:16 -0500, Goldwyn Rodrigues wrote:
> > > On 18:51 27/09, Ian Kent wrote:
> > > > On Fri, 2019-09-27 at 15:41 +0800, Ian Kent wrote:
> > > > > > > I initially thought this was the result of a "fix" in the
> > > > > > > mount
> > > > > > > propagation code but it occurred to me that propagation
> > > > > > > is
> > > > > > > meant
> > > > > > > to occur between mount trees not within them so this
> > > > > > > might be
> > > > > > > a
> > > > > > > bug.
> > > > > > > 
> > > > > > > I probably should have worked out exactly what upstream
> > > > > > > kernel
> > > > > > > this started happening in and then done a bisect and
> > > > > > > tried to
> > > > > > > work out if the change was doing what it was supposed to.
> > > > > > > 
> > > > > > > Anyway, I'll need to do that now for us to discuss this
> > > > > > > sensibly.
> > > > > > > 
> > > > > > > > > Signed-off-by: Goldwyn Rodrigues <rgoldwyn@xxxxxxxx>
> > > > > > > > > 
> > > > > > > > > diff --git a/fs/pnode.c b/fs/pnode.c
> > > > > > > > > index 49f6d7ff2139..b960805d7954 100644
> > > > > > > > > --- a/fs/pnode.c
> > > > > > > > > +++ b/fs/pnode.c
> > > > > > > > > @@ -292,6 +292,9 @@ int propagate_mnt(struct mount
> > > > > > > > > *dest_mnt,
> > > > > > > > > struct
> > > > > > > > > mountpoint *dest_mp,
> > > > > > > > >  	struct mount *m, *n;
> > > > > > > > >  	int ret = 0;
> > > > > > > > >  
> > > > > > > > > +	if (source_mnt->mnt_mountpoint->d_flags &
> > > > > > > > > DCACHE_NEED_AUTOMOUNT)
> > > > > > > > > +		return 0;
> > > > > > > > > +
> > > > > > > > 
> > > > > > > > Possible problem with this is it will probably prevent
> > > > > > > > mount
> > > > > > > > propagation in both directions which will break stuff.
> > > 
> > > No, I am specifically checking when the source has a automount
> > > flag
> > > set.
> > > It will block only one way. I checked it with an example.
> > 
> > I don't understand how this check can selectively block
> > propagation?
> > 
> > If you have say:
> > test    /       :/exports \
> >         /tmp    :/exports/tmp \
> >         /lib    :/exports/lib
> > 
> > and
> > /bind	/etc/auto.exports
> > 
> > in /etc/auto.master
> > 
> > and you use say:
> > 
> > docker run -it --rm -v /bind:/bind:slave fedora-autofs:v1 bash
> > 
> > your saying the above will not propagate those offset trigger
> > mounts
> > to the parent above the /bind/test mount but will propagate them to
> > the container?
> 
> Yes, It works for me. I could not find the fedora-autofs, but used
> fedora image. Check both cases. The first one (vanilla) is my
> modified
> kernel with the patch I mentioned.

Sorry, my bad, fedora-autofs is simply fedora container image setup
to test autofs, the straight fedora image (or any other Linux os
image for that matter) should be fine for this.

> 
> [root@fedora30 ~]# cat /etc/auto.master
> /bind   /etc/auto.exports 
> 
> Note, there is no options. I added -bind in the config you provided.

Ok, I thought that wouldn't be needed, but the /exports directory
tree would need to exist in the container file system.

> 
> [root@fedora30 ~]# cat /etc/auto.exports 
> test -bind   /       :/exports \
>         /tmp    :/exports/tmp \
>         /lib    :/exports/lib
> 
> [root@fedora30 ~]# uname -a
> Linux fedora30 5.3.1vanilla+ #9 SMP Tue Oct 1 11:11:11 CDT 2019
> x86_64 x86_64 x86_64 GNU/Linux
> [root@fedora30 ~]# docker run -it --rm -v /bind:/bind:slave fedora
> bash
> [root@cf881c09f90a /]# ls /bind
> [root@cf881c09f90a /]# ls /bind/test
> lib  tmp
> [root@cf881c09f90a /]# ls /bind/test/lib
> lib-file

Ok, maybe it's the container root file system being marked
as propagation private that makes this behave differently
or it could be because the container using a different mount
name space.

Still, it's curious that you get mounts from the init mount
name space propagating to the container with that kernel
change.

I don't understand what's going on there, the change looks
like it prevents propagation across the board. Or maybe it
isn't actually mount propagation that causes this to happen
and the change papers over some other bug.

I admit I always thought it was the propagation too.

> 
> However, on existing fedora 30 kernel..
> 
> [root@fedora30 ~]# uname -a
> Linux fedora30 5.2.17-200.fc30.x86_64 #1 SMP Mon Sep 23 13:42:32 UTC
> 2019 x86_64 x86_64 x86_64 GNU/Linux
> [root@fedora30 ~]# docker run -it --rm -v /bind:/bind:slave fedora
> bash
> [root@0a1d4f3dc475 /]# ls /bind
> [root@0a1d4f3dc475 /]# ls /bind/test
> lib  tmp
> [root@0a1d4f3dc475 /]# ls /bind/test/lib
> <hangs>
> 

Ok, I'll have a look at that case myself but it sounds like
the unwanted propagation is occurring with the unpatched
kernel.

> 
> > It looks like that check is all or nothing to me?
> > Can you explain a bit more.
> > 
> > > > > > > > I had originally assumed the problem was mount
> > > > > > > > propagation
> > > > > > > > back to the parent mount but now I'm not sure that this
> > > > > > > > is
> > > > > > > > actually what is meant to happen.
> > > > 
> > > > Goldwyn,
> > > > 
> > > > TBH I'm already a bit over this particularly since it's a
> > > > solved problem from my POV.
> > > > 
> > > > I've gone back as far as Fedora 20 and 3.11.10-301.fc20 also
> > > > behaves like this.
> > > 
> > > The problem started with the root directory being mounted as
> > > shared.
> > 
> > The change where systemd set the root file system propagation
> > shared was certainly an autofs pain point (for more than just
> > this case too) but I was so sure that wasn't when this started
> > happening.
> > 
> > But ok, I could be mistaken, and you seem to be sure about.
> 
> Well, it might as well be with the propagation code. I am not sure
> what introduced this.

Yeah, it is an annoying problem, I can certainly agree with
that.

I've had difficulty working out what's going on with the
mount propagation code over time (there's so many special
cases in various places) so I'd really like to understand
how this change appears to achieve what's needed.

It doesn't look like it should.

> 
> > > > Unless someone says this behaviour is not the way kernel
> > > > mount propagation should behave I'm not going to spend
> > > > more time on it.
> > > > 
> > > > The ability to use either "slave" or "private" autofs pseudo
> > > > mount options in master map mount entries that are susceptible
> > > > to this mount propagation behaviour was included in autofs-
> > > > 5.1.5
> > > > and the patches used are present on kernel.org if you need to
> > > > back port them to an earlier release.
> > > 
> > > What about "shared" pseudo mount option? The point is the default
> > > shared option with automount is broken, and should not be exposed
> > > at all.
> > 
> > What about shared mounts?
> > 
> > I don't know of a case where propagation shared is actually needed.
> > If you know of one please describe it.
> 
> No, I don't have a use case for shared mounts. I am merely trying to
> emphasize the default option (which behaves as shared) is broken.

Yes, the inheritance of propagation shared from the root breaks
the mounting of autofs trigger mounts within a bind mount that's
mounted onto a directory in the root file system. That's the only
case I've seen myself but there may be more.

It is a special case we don't come across that often.
I think that's the reason it went unnoticed for so long.

> 
> > The most common case is "slave" and the "private" option was only
> > included because it might be needed if people are using isolated
> > environments but TBH I'm not at all sure it could actually be used
> > for that case.
> > 
> > IIUC the change to the propagation of the root file system was
> > done to help with containers but turned out not to do what was
> > needed and was never reverted. So the propagation shared change
> > probably should have been propagation slave or not changed at
> > all.
> 
> I agree. I am also worried of the swelling /proc/mounts because
> of this.

I think that's the least of our problems with this.
The deadlock that results from an autofs trigger mount trying to
mount on the propagated trigger mount (that should not exist) is
the real problem.

> 
> > > > https://mirrors.edge.kernel.org/pub/linux/daemons/autofs/v5/patches-5.1.5/autofs-5.1.4-set-bind-mount-as-propagation-slave.patch
> > > > 
> > > > https://mirrors.edge.kernel.org/pub/linux/daemons/autofs/v5/patches-5.1.5/autofs-5.1.4-add-master-map-pseudo-options-for-mount-propagation.patch
> > > > 
> > > > It shouldn't be too difficult to back port them but they might
> > > > have other patch dependencies. I will help with that if you
> > > > need it.
> > > 
> > > My problem is not with the patch and the "private" or "slave"
> > > flag,
> > > but
> > > with the absence of it. We have the patch you mention in our
> > > repos.
> > 
> > Ha, play on words, "absence of it" and "we have it in our repos"
> 
> I meant, Absence of "private" or "slave" flags.

Yea, my little joke may have been in poor taste.

> 
> > Don't you mean the problem is that mount propagation isn't
> > set correctly automatically by automount.
> > 
> > > I am assuming that users are stupid and they will miss putting
> > > the
> > > flags
> > > in the auto.master file and wonder why when they try to access
> > > the
> > > directories
> > > the process hangs. In all, any user configuration should not hang
> > > the
> > > kernel.
> > 
> > I thought about that when I was working on those patches but,
> > at the time, I didn't think the propagation problem had started
> > when the root file system was set propagation shared at boot.
> > 
> > I still think changing the kernel propagation isn't the right
> > way to resolve it.
> > 
> > But I would be willing to add a configuration option to autofs
> > that when set would use propagation slave for all bind mounts
> > without the need to modify the master map. Given how long the
> > problem has been around I'm also tempted to make it default
> > to enabled.
> > 
> > I'm not sure yet what that would mean for the existing mount
> > options "shared" and "private" other than them possibly being
> > needed if the option is disabled, even with this I'm still not
> > sure a "shared" option is useful.
> > 
> > Isn't this automation your main concern?
> > 
> 
> My main concerns is a user space configuration should not hang
> the process.
> 
> This is a problem for people upgrading their kernel/systemd
> and finding their processes hanging.
> 
> I am fine with making the change in user space automount daemon
> keeping
> slave mounts as default, but then you would leave out a small
> security
> window where users can hang the accessing process by
> modifying/replacing
> automount.

The same could be said of replacing the kernel since the
problem has been around for so long.

Anyway, it does sound like it's worth putting time into
your suggestion of a kernel change.

Unfortunately I think it's going to take a while to work
out what's actually going on with the propagation and I'm
in the middle of some other pressing work right now.

Ian