On Thu, Aug 12, 2021 at 07:36:24AM +0200, Michael Kerrisk (man-pages) wrote: > Hello Christian, > > Thanks for the answers. > > A couple of small queries still below. > > On 8/11/21 12:07 PM, Christian Brauner wrote: > > On Tue, Aug 10, 2021 at 11:06:52PM +0200, Michael Kerrisk (man-pages) wrote: > > [...] > > >>>>> EINVAL The mount that is to be ID mapped is not a > >>>>> detached/anonymous mount; that is, the mount is > >>>> > >>>> ??? > >>>> What is a the distinction between "detached" and "anonymous"? > >>>> Or do you mean them to be synonymous? If so, then let's use > >>>> just one term, and I think "detached" is preferable. > >>> > >>> Yes, they are synonymous here. I list both because detached can > >>> potentially be confusing. A detached mount is a mount that has not been > >>> visible in the filesystem. But if you attached it an then unmount it > >>> right after and keep the fd for the mountpoint open it's a detached > >>> mount purely on a natural language level, I'd argue. But it's not a > >>> detached mount from the kernel's view anymore because it has been > >>> exposed in the filesystem and is thus not detached anymore. > >>> But I do prefer "detached" to "anonymous" and that confusion is very > >>> unlikely to occur. > >> > >> Thanks. I made it "detached". Elsewhere, the page already explains > >> that a detached mount is one that: > >> > >> must have been created by calling open_tree(2) with the > >> OPEN_TREE_CLONE flag and it must not already have been > >> visible in the filesystem. > >> > >> Which seems a fine explanation. > >> > >> ???? > >> But, just a thought... "visible in the filesystem" seems not quite accurate. > >> What you really mean I guess is that it must not already have been > >> /visible in the filesystem hierarchy/previously mounted/something else/, > >> right? > > I suppose that I should have clarified that my main problem was > that you were using the word "filesystem" in a way that I find > unconventional/ambiguous. I mean, I normally take the term > "filesystem" to be "a storage system for folding files". > Here, you are using "filesystem" to mean something else, what > I might call like "the single directory hierarchy" or "the > filesystem hierarchy" or "the list of mount points". > > > A detached mount is created via the OPEN_TREE_CLONE flag. It is a > > separate new mount so "previously mounted" is not applicable. > > A detached mount is _related_ to what the MS_BIND flag gives you with > > mount(2). However, they differ conceptually and technically. A MS_BIND > > mount(2) is always visible in the fileystem when mount(2) returns, i.e. > > it is discoverable by regular path-lookup starting within the > > filesystem. > > > > However, a detached mount can be seen as a split of MS_BIND into two > > distinct steps: > > 1. fd_tree = open_tree(OPEN_TREE_CLONE): create a new mount > > 2. move_mount(fd_tree, <somewhere>): attach the mount to the filesystem > > > > 1. and 2. together give you the equivalent of MS_BIND. > > In between 1. and 2. however the mount is detached. For the kernel > > "detached" means that an anonymous mount namespace is attached to it > > which doen't appear in proc and has a 0 sequence number (Technically, > > there's a bit of semantical argument to be made that "attached" and > > "detached" are ambiguous as they could also be taken to mean "does or > > does not have a parent mount". This ambiguity e.g. appears in > > do_move_mount(). That's why the kernel itself calls it an "anonymous > > mount". However, an OPEN_TREE_CLONE-detached mount of course doesn't > > have a parent mount so it works.). > > > > For userspace it's better to think of detached and attached in terms of > > visibility in the filesystem or in a mount namespace. That's more > > straightfoward, more relevant, and hits the target in 90% of the cases. > > > > However, the better and clearer picture is to say that a > > OPEN_TREE_CLONE-detached mount is a mount that has never been > > move_mount()ed. Which in turn can be defined as the detached mount has > > never been made visible in a mount namespace. Once that has happened the > > mount is irreversibly an attached mount. > > > > I keep thinking that maybe we should just say "anonymous mount" > > everywhere. So changing the wording to: > > I'm not against the word "detached". To user space, I think it is a > little more meaningful than "anonymous". For the moment, I'll stay with > "detached", but if you insist on "anonymous", I'll probably change it. No, sounds good. > > > [...] > > EINVAL The mount that is to be ID mapped is not an anonymous mount; > > that is, the mount has already been visible in a mount namespace. > > I like that text *a lot* better! Thanks very much for suggesting > wordings. It makes my life much easier. > > I've made the text: > > EINVAL The mount that is to be ID mapped is not a detached > mount; that is, the mount has not previously been > visible in a mount namespace. Sounds good. > > > [...] > > The mount must be an anonymous mount; that is, it must have been > > created by calling open_tree(2) with the OPEN_TREE_CLONE flag and it > > must not already have been visible in a mount namespace, i.e. it must > > not have been attached to the filesystem hierarchy with syscalls such > > as move_mount() syscall. > > And that too! I've made the text: > > • The mount must be a detached mount; that is, it must have > been created by calling open_tree(2) with the > OPEN_TREE_CLONE flag and it must not already have been > visible in a mount namespace. (To put things another way: > the mount must not have been attached to the filesystem > hierarchy with a system call such as move_mount(2).) Sounds good. > > > [...] > > > > (I'm using the formulation "with syscalls such as move_mount()" to > > future proof this. :)). > > Fair enough. > > >>>>> EXAMPLES > >>>> > >>>> ??? > >>>> Do you have a (preferably simple) example piece of code > >>>> somewhere for setting up an ID mapped mount? > >> > >> ???? > >> I guess the best example is this: > >> https://github.com/brauner/mount-idmapped/ > >> right? > > > > Ah yes, sorry. I forgot to answer that yesterday. I sent you links via > > another medium but I repeat it here. > > There are two places. The link you have here is a private repo. But I've > > also merged a program alongside the fstests testsuite I merged: > > https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/src/idmapped-mounts/mount-idmapped.c > > which should be nicer and has seen reviews by Amir and Christoph. > > Thanks. > > [...] > > >>>>> int fd_tree = open_tree(-EBADF, source, > >>>>> OPEN_TREE_CLONE | OPEN_TREE_CLOEXEC | > >>>>> AT_EMPTY_PATH | (recursive ? AT_RECURSIVE : 0)); > >>>> > >>>> ??? > >>>> What is the significance of -EBADF here? As far as I can tell, it > >>>> is not meaningful to open_tree()? > >>> > >>> I always pass -EBADF for similar reasons to [2]. Feel free to just use -1. > >> > >> ???? > >> But here, both -EBADF and -1 seem to be wrong. This argument > >> is a dirfd, and so should either be a file descriptor or the > >> value AT_FDCWD, right? > > > > [1]: In this code "source" is expected to be absolute. If it's not > > absolute we should fail. This can be achieved by passing -1/-EBADF, > > afaict. > > D'oh! Okay. I hadn't considered that use case for an invalid dirfd. > (And now I've done some adjustments to openat(2),which contains a > rationale for the *at() functions.) > > So, now I understand your purpose, but still the code is obscure, > since > > * You use a magic value (-EBADF) rather than (say) -1. > * There's no explanation (comment about) of the fact that you want > to prevent relative pathnames. > > So, I've changed the code to use -1, not -EBADF, and I've added some > comments to explain that the intent is to prevent relative pathnames. > Okay? Sounds good. > > But, there is still the meta question: what's the problem with using > a relative pathname? Nothing per se. Ok, you asked so it's your fault: When writing programs I like to never use relative paths with AT_FDCWD because. Because making assumptions about the current working directory of the calling process is just too easy to get wrong; especially when pivot_root() or chroot() are in play. My absolut preference (joke intended) is to open a well-known starting point with an absolute path to get a dirfd and then scope all future operations beneath that dirfd. This already works with old-style openat() and _very_ cautious programming but openat2() and its resolve-flag space have made this **chef's kiss**. If I can't operate based on a well-known dirfd I use absolute paths with a -EBADF dirfd passed to *at() functions. Christian