On Thu, Jan 16, 2025 at 04:14:59AM +0000, Al Viro wrote: > On Wed, Jan 15, 2025 at 10:56:08AM -0800, Boris Burkov wrote: > > Hello, > > > > If we run the following C code: > > > > unshare(CLONE_NEWNS); > > int fd = open("/dev/loop0", O_RDONLY) > > unshare(CLONE_NEWNS); > > > > Then after the second unshare, the mount hierarchy created by the first > > unshare is fully dereferenced and gets torn down, leaving the file > > pointed to by fd with a broken dentry. > > No, it does not. dentry is just fine and so's mount - it is not > attached to anything, but it's alive and well. > > > Specifically, subsequent calls to d_path on its path resolve to > > "/loop0". > > > My question is: > > Is this expected behavior with respect to mount reference counts and > > namespace teardown? > > Yes. Again, mount is still alive; it is detached, but that's it. > > > If I mount a filesystem and have a running program with an open file > > descriptor in that filesystem, I would expect unmounting that filesystem > > to fail with EBUSY, so it stands to reason that the automatic unmount > > that happens from tearing down the mount namespace of the first unshare > > should respect similar semantics and either return EBUSY or at least > > have the lazy umount behavior and not wreck the still referenced mount > > objects. > > Lazy umount is precisely what is happening. Referenced mount object is > there as long as it is referenced. Thank you for your reply and explanations. So in your opinion, what is the bug here? btrfs started using d_path and checking that the device source file was in /dev, to avoid putting nonsense like /proc/self/fd/3 into the mount table, where it makes userspace fall over. (https://bugzilla.suse.com/show_bug.cgi?id=1230641) I'd be loathe to call the userspace program hitting the 'unshare; open; unshare' sequence buggy, as we don't fail any of the syscalls in a particularly sensible way. And if you use unshare -m, you now have to vet the program you call doesn't use unshare itself? You've taught me that d_path is working as intended in the face of the namespace lifetime, so we can't rely on it to produce the "real" (original?) path, in general. So, to me, that leaves the bug as "btrfs shouldn't assume/validate that device files will be in /dev." We can do the d_path resolution thing anyway to cover the common case, in the bugzilla, but shouldn't fail on something like /loop0 when that is what we get out of d_path? Boris