On Sun, 2014-03-02 at 15:55 +0100, Donald Buczek wrote: > Am 02.03.2014 08:10, schrieb Ian Kent: > > On Sun, 2014-03-02 at 10:22 +0800, Ian Kent wrote: > >> On Fri, 2014-02-28 at 08:29 -0500, Alexander Viro wrote: > >>> On Fri, Feb 28, 2014 at 01:12:58PM +0100, Donald Buczek wrote: > >>> > >>>> Obviously, "cleared mounted on dentry" is missing. > >>>> > >>>> It looks like we enter put_mountpoint() but don't get to > >>>> dentry->d_flags &= ~DCACHE_MOUNTED; > >>>> > >>>> mp->m_count is not zero probably. > >>>> > >>>> What does it mean? The mount is still locked but not in the mount hash? > >>> No, it means that something else is mounted on the same dentry (in another > >>> part of mount tree, obviously). > >>> > >>> If you mount the same fs on two different mountpoints, e.g. > >>> mount /dev/sda1 /mnt > >>> mount /dev/sda1 /tmp/foo > >>> you will have the same dentries seen in two places. Now, > >>> mount /dev/sdb11 /mnt/a > >>> mount /dev/sdc5 /tmp/foo/a > >>> > >>> and you've got two different filesystems mounted on two different places > >>> (/mnt/a and /tmp/foo/a). These two places have different vfsmounts, > >>> but the same dentry. struct mountpoint is associated with dentry, so > >>> it's also the same for both. And it serves as a mountpoint for two > >>> vfsmounts - one for fs from sdb11, another for fs from sdc5. > >>> > >>> Now umount /mnt/a; one of those two vfsmounts is gone now. struct mountpoint > >>> survives, of course, and dentry is *still* a mountpoint. sdc5 is still > >>> mounted on /tmp/foo/a, after all... > > Good example but for autofs file systems doesn't this amount to saying > > its been bound somewhere else? > > > > Illegal as far as autofs is concerned because an autofs mount is > > strictly associated with a path defined by its map. > > > > And, yes, bind mounting an autofs file system elsewhere isn't vetoed by > > the kernel. > > > > This makes be start thinking about implications wrt. containers .... > > > >> Ahh, right ... I'll need to think about my use (misuse) of > >> d_mountpoint(). > > So maybe I don't need to worry about this just yet. I think you've hit on almost all the current problems I'm struggling with and adds to it, ;) > > I think you should, because exactly this is the bug. > d_mountpoint(dentry) just says, that we have a struct mountpoint for the > dentry. It does not say, that the path is mounted in the current > namespace. The struct mountpoint might exists, because the path is > mounted in other namespaces but not ours. Yes, and this adds a new case to the list of problems. > > The problem at our site is clear now: > > We have only one service with PrivateTmp=yes which is colord.service. > And here is the missing mount: > > > root:kasslerbraten:/lib/systemd/system/# ps -Af|fgrep colord > > root 7670 1 0 Feb28 ? 00:00:00 /usr/lib/colord/colord > > root 7897 7329 0 14:46 pts/8 00:00:00 fgrep colord > > root:kasslerbraten:/lib/systemd/system/# cat /proc/7670/mounts|grep > > mariux32 > > pille:/amd/pille/1/project/mariux32 /project/mariux32 nfs > > rw,nosuid,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=141.14.28.250,mountvers=3,mountport=56263,mountproto=udp,local_lock=none,addr=141.14.28.250 > > 0 0 > > colord.service is dbus-started. So it is started quiet randomly and > depending on user usage pattern, mostly but not exclusively on > workstations. That is exactly how we've seen the bug to appear. > > When the services is started, systemd uses unshare(CLONE_NEWNS) to clone > the namespace. This new namespace inherits existing mounts, including > automounted ones. > These mounts might eventually expire at a later time. When this occurs, > they are dismounted from the automount daemons namespace, which is the > global, pid 1 namespace. But because they are still mounted in another > namespace, the dentry stays flagged as DCACHE_MOUNTED, which prevents > autofs to remount it on access. The mount, however, just exists in > another namespace and is useless for anybody else. Useless yes, but there is currently no way to mount something so that it won't be propagated. No, MS_PRIVATE says "I'm private don't propagate my children". To add a flag to do this isn't a simple task either AFAICS. And then there are those that explicitly want the propagation and expect it to work. I think they will eventually be disappointed. > > Final prove, that this is the true story: > > > root:kasslerbraten:/lib/systemd/system/# ls /project/mariux32 > > ls: cannot open directory /project/mariux32: Too many levels of > > symbolic links > > root:kasslerbraten:/lib/systemd/system/# kill -9 7670 > > root:kasslerbraten:/lib/systemd/system/# ls /project/mariux32 > > beeroot home i686 svnroot > > root:kasslerbraten:/lib/systemd/system/# > > Of course, I can easily work around that in our environment (eg. just > remove PrivateTmp=yes from the service). So I'm pretty sure, it will > work for me now. > The bug, however, is in autofs. systemd is doing perfectly legal > user-mode things. > > Perhaps autofs should use lookup_mnt() to decide along this pattern: > > if ( dentry->d_flags & DCACHE_MOUNTED && lookup_mnt(path) ) { > /* mounted */ > } else { > /* not mounted */ > } Also, not as simple as you might think. First lookup_mnt() isn't exported and I believe the preference is that, that doesn't change. But follow_down_one() is exported and could be used. Next, it would involve changing the function signature of a dentry operation function. That function could be used by other modules that we don't know about and they would break. > > That doesn't solve the problem, however, that mounts cloned by a > unshare(CLONE_NEWNS) would never expire. Also there is another bug > somewhere, because I see, that the mount, visible to the > /usr/lib/colord/colord process was logged as "unmounted" in the nfs > server when it expired in the global namespace. So I doubt it would be > working even for that process. So possibly automounted mounts shouldn't > be cloned at all? Together with chroot or pivot_root the sematics would > be more than unclear anyway. Your problem now :-) Hehe, like I said some people are going to be disappointed. There's just one question about this that remains. Assuming systemd is setting "/" shared what happens if "mount --make-rprivate /" is run before autofs is started? So if you can spend a little more time on this an answer to this would be helpful. > > Thanks for you help with this! Actually, thank you. This investigation has given me quite a bit of new insight into the current difficulties I have with namespace handling. Ian -- To unsubscribe from this list: send the line "unsubscribe autofs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html