"Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: > Hi Andrey, > > On 07/22/2016 08:25 PM, Andrey Vagin wrote: >> On Thu, Jul 21, 2016 at 11:48 PM, Michael Kerrisk (man-pages) >> <mtk.manpages@xxxxxxxxx> wrote: >>> Hi Andrey, >>> >>> >>> On 07/21/2016 11:06 PM, Andrew Vagin wrote: >>>> >>>> On Thu, Jul 21, 2016 at 04:41:12PM +0200, Michael Kerrisk (man-pages) >>>> wrote: >>>>> >>>>> Hi Andrey, >>>>> >>>>> On 07/14/2016 08:20 PM, Andrey Vagin wrote: >>>> >>>> >>>> <snip> >>>> >>>>> >>>>> Could you add here an of the API in detail: what do these FDs refer to, >>>>> and how do you use them to solve the use case? And could you you add >>>>> that info to the commit messages please. >>>> >>>> >>>> Hi Michael, >>>> >>>> A patch for man-pages is attached. It adds the following text to >>>> namespaces(7). >>>> >>>> Since Linux 4.X, the following ioctl(2) calls are supported for names‐ >>>> pace file descriptors. The correct syntax is: >>>> >>>> fd = ioctl(ns_fd, ioctl_type); >>>> >>>> where ioctl_type is one of the following: >>>> >>>> NS_GET_USERNS >>>> Returns a file descriptor that refers to an owning user names‐ >>>> pace. >>>> >>>> NS_GET_PARENT >>>> Returns a file descriptor that refers to a parent namespace. >>>> This ioctl(2) can be used for pid and user namespaces. For user >>>> namespaces, NS_GET_PARENT and NS_GET_USERNS have the same mean‐ >>>> ing. > > For each of the above, I think it is worth mentioning that the > close-on-exec flag is set for the returned file descriptor. Hmm. That is an odd default. >>>> >>>> In addition to generic ioctl(2) errors, the following specific ones can >>>> occur: >>>> >>>> EINVAL NS_GET_PARENT was called for a nonhierarchical namespace. >>>> >>>> EPERM The requested namespace is outside of the current namespace >>>> scope. > > Perhaps add "and the caller does not have CAP_SYS_ADMIN" in the initial > user namespace"? Having looked at that bit of code I don't think capabilities really have a role to play. >>>> ENOENT ns_fd refers to the init namespace. >>> >>> >>> Thanks for this. But still part of the question remains unanswered. >>> How do we (in user-space) use the file descriptors to answer any of >>> the questions that this patch series was designed to solve? (This >>> info should be in the commit message and the man-pages patch.) >> >> I'm sorry, but I am not sure that I understand what you ask. >> >> Here are the origin questions: >> Someone else then asked me a question that led me to wonder about >> generally introspecting on the parental relationships between user >> namespaces and the association of other namespaces types with user >> namespaces. One use would be visualization, in order to understand the >> running system. Another would be to answer the question I already >> mentioned: what capability does process X have to perform operations >> on a resource governed by namespace Y? >> >> Here is an example which shows how we can get the owning namespace >> inode number by using these ioctl-s. >> >> $ ls -l /proc/13929/ns/pid >> lrwxrwxrwx 1 root root 0 Jul 22 21:03 /proc/13929/ns/pid -> 'pid:[4026532228]' >> >> $ ./nsowner /proc/13929/ns/pid >> user:[4026532227] >> >> The owning user namespace for pid:[4026532228] is user:[4026532227]. >> >> The nsowner tool is cimpiled from this code: >> >> int main(int argc, char *argv[]) >> { >> char buf[128], path[] = "/proc/self/fd/0123456789"; >> int ns, uns, ret; >> >> ns = open(argv[1], O_RDONLY); >> if (ns < 0) >> return 1; >> >> uns = ioctl(ns, NS_GET_USERNS); >> if (uns < 0) >> return 1; >> >> snprintf(path, sizeof(path), "/proc/self/fd/%d", uns); >> ret = readlink(path, buf, sizeof(buf) - 1); >> if (ret < 0) >> return 1; >> buf[ret] = 0; >> >> printf("%s\n", buf); >> >> return 0; >> } > > So, from my point of view, the important piece that was missing from > your commit message was the note to use readlink("/proc/self/fd/%d") > on the returned FDs. I think that detail needs to be part of the > commit message (and also the man page text). I think it even be > helpful to include the above program as part of the commit message: > it helps people more quickly grasp the API. Please, please make the standard way to compare these things fstat. That is much less magic than a symlink, and a little more future proof. Possibly even kcmp. At some point we will care about migrating a migrating sub-container and we may have to have some minor changes. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html