Re: Possible bug with open between unshare(CLONE_NEWNS) calls

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 15, 2025 at 10:56:08AM -0800, Boris Burkov wrote:
> Hello,
> 
> If we run the following C code:
> 
> unshare(CLONE_NEWNS);
> int fd = open("/dev/loop0", O_RDONLY)
> unshare(CLONE_NEWNS);
> 
> Then after the second unshare, the mount hierarchy created by the first
> unshare is fully dereferenced and gets torn down, leaving the file
> pointed to by fd with a broken dentry.
> 
> Specifically, subsequent calls to d_path on its path resolve to
> "/loop0". I was able to confirm this with drgn, and it has caused an
> unexpected failure in mkosi/systemd-repart attempting to mount a btrfs
> filesystem through such an fd, since btrfs uses d_path to resolve the
> source device file path fully.
> 
> I confirmed that this is definitely due to the first unshare mount
> namespace going away by:
> 1. printks/bpftrace the copy_root path in the kernel
> 2. rewriting my test program to fork after the first unshare to keep
> that namespace referenced. In this case, the fd is not broken after the
> second unshare.
> 
> 
> My question is:
> Is this expected behavior with respect to mount reference counts and
> namespace teardown?
> 
> If I mount a filesystem and have a running program with an open file
> descriptor in that filesystem, I would expect unmounting that filesystem
> to fail with EBUSY, so it stands to reason that the automatic unmount
> that happens from tearing down the mount namespace of the first unshare
> should respect similar semantics and either return EBUSY or at least
> have the lazy umount behavior and not wreck the still referenced mount
> objects.
> 
> If this behavior seems like a bug to people better versed in the
> expected behavior of namespaces, I would be happy to work on a fix.

It's expected as Al already said. And is_good_dev_path()
looks pretty hacky...

Wouldn't something like:

bool is_devtmpfs(const struct super_block *sb)
{
        return sb->s_type == &dev_fs_type;
}

and then:

        ret = kern_path(dev_path, 0, &path);
        if (ret)
                goto out;

	if (is_devtmpfs(path->mnt->mnt_sb))
		// something something

be enough? Or do you specifically need to care where devtmpfs is
mounted? The current check means that anything that mounts devtmpfs
somewhere other than /dev would fail that check.

Of course, any standard Linux distribution will mount devtmpfs at /dev
so it probably won't matter in practice. And contains may make /dev a
tmpfs mount and bind-mount device nodes in from the host's devtmpfs so
that would work too with this check.

In other words, I don't get why the /dev prefix check gets you anything?
If you just verify that the device node is located on devtmpfs you
should be good.




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux