Quoting Eric W. Biederman (ebiederm@xxxxxxxxxxxx): > "Serge E. Hallyn" <serue@xxxxxxxxxx> writes: > > >> (3.2) mnt namespace maybe ? > > > > I think the last one is the way to go. > > > > mnt_namespace points to mq_ns. > > > > At clone(CLONE_NEWMNT), the new mnt namespace receives a copy of the > > parent's mq_ns. > > > > If a task does > > mount -o newinstance -t mqueue none /dev/mqueue > > then its current->nsproxy->mnt_namespace->mqns is switched > > to point to a new instance of the mq_ns. > > > > mnt_ns->mq_ns has pointers to the sb (and hence root dentry) of the > > devpts fs. > > > > When a task does mq_open(name, flag), then name is in the mqueuefs > > found in current->nsproxy->mnt_namespace->mqns. > > > > But if a task does > > > > clone(CLONE_NEWMNT); > > mount --move /dev/mqueue /oldmqueue > > mount -o newinstance -t mqueue none /dev/mqueue > > > > then that task can find files for the old mqueuefs under > > /oldmqueue, while mq_open() uses /dev/mqueue since that's > > what it finds through its mnt_namespace. > > Serge if we can make the lookup a pure mount namespace operation > i.e. a well known path. Than I don't have any problems with it. > Otherwise it looks like abuse of the mount namespace. Why? Actually it may work to just put mq_ns straight in the nsproxy. So let's see: mq_open(name, flag): opens name under the dentry pointed to by current->nsproxy->mq_ns->mq_dentry mount -t mqueue none /dev/mqueue: either returns -EBUSY or just mounts current->nsproxy->mq_ns->mq_sb under /dev/mqueue mount -o newinstance -t mqueue none /dev/mqueue: mounts a new mq_ns instance under /dev/mqueue While doing mount --make-rshared /vs1 mount --bind /dev/mqueue /vs1/dev/mqueue create_a_new_container_chrooted_at(/vs1) mount -o newinstance -t mqueue none /dev/mqueue would allow the host to see the child's /dev/mqueue under /vs1/dev/mqueue while having its own mqueuefs continue to be mounted under /dev/mqueue. > In particular. The best approximation I have is to change the > kernel to simply lookup "/dev/mqueue" and if not found fallback > to the initial kernel instance. Having the kernel walk a hard-coded pathname to find it? That I really don't like. > I'm staring at the code as I really haven't looked at it enough > but it sure looks like we can transform it into a proper filesystem > with just a touch of backwards compatibility logic. > - put the current mq_namespace in the superblock. > - Have open/unlink lookup "/dev/mqueue" to find the filesystem > if nothing is found fallback to the internal mount otherwise error. > - Possibly put the tunables in a subdirectory? and > bind mount that subdirectory on top of /proc/sys/fs/mqueue/ > > I'm too thrilled about the tunables but still. If mq_ns is stored under nsproxy, then so long as the task has remounted /proc for its pidns anyway, we should be able to show the right tunables under /proc as well, right? > Are there any security holes or other oddness we would encounter > if we did that? > > If we can turn the posix mqueue stuff into an honest to goodness > filesystem then we completely avoid nsproxy, I am assuming that mq_open() is posix-defined? So the only way we could do that is, as you suggest, to have the kernel look up the hard-coded /dev/mqueue path which IMO is a non-starter, and not worth it to avoid nsproxy. > and have something > that is much nicer to deal with long term. > > Eric _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers