On Thu, Jan 23, 2025 at 11:40:41PM -0600, Mike Baynton wrote: > On 1/23/25 13:21, Christian Brauner wrote: > > On Wed, Jan 22, 2025 at 10:18:17PM -0600, Mike Baynton wrote: > >> Hi, > >> I've been eagerly awaiting the arrival of lowerdir+ by file handle, as > >> it looks likely to be well-suited to simplifying the task a container > >> runtime must take on in order to provide a set of properly idmapped > >> lower layers for a user namespaced container. Currently in containerd, > >> this is done by creating bindmounts for each required lower layer in > >> order to apply idmapping to them. Each of these bindmounts must be > >> briefly attached to some path-resolvable mountpoint before the overlay > >> is created, which seems less than ideal and is contributing to some > >> cleanup headaches e.g. when other software that may be present jumps on > >> the new mount and starts security scanning it or whatnot. > >> > >> In order to better isolate the idmap bindmounts I was hoping to do > >> something like: > >> > >> ovl_ctx = fsopen("overlay", FSOPEN_CLOEXEC); > >> > >> opfd = open_tree(-1, "/path/to/unmapped/layer", > >> OPEN_TREE_CLONE|OPEN_TREE_CLOEXEC); > >> mount_setattr(opfd, "", AT_EMPTY_PATH, /* attrs to set a userns_fd */); > >> dfd = openat(opfd, ".", O_DIRECTORY, mode); > > > > Unless I forgot detaile, openat() shouldn't be needed as speciyfing > > layers via O_PATH file descriptors should just work. > > O_PATH ones currently result in EBADF, iirc just because fsconfig with > FSCONFIG_SET_FD looks up the file descriptor in a way that masks O_PATH. > This took some time to work out too, but doesn't strike me as a huge > deal. Although I suppose it's one of those things that if it were > improved far down the road would probably lead to next to nobody > removing the openat(). Oh right. We should be able to enable FSONFIG_SET_FD to accept O_PATH file descriptors. To not break existing users we need do introduce: diff --git a/include/linux/fs_context.h b/include/linux/fs_context.h index 4b4bfef6f053..e160e7c61e4b 100644 --- a/include/linux/fs_context.h +++ b/include/linux/fs_context.h @@ -55,6 +55,7 @@ enum fs_value_type { fs_value_is_blob, /* Value is a binary blob */ fs_value_is_filename, /* Value is a filename* + dirfd */ fs_value_is_file, /* Value is a file* */ + fs_value_is_file_fmode_path, /* Value is a file* */ }; /* diff --git a/include/linux/fs_parser.h b/include/linux/fs_parser.h index 3cef566088fc..17ba4951298b 100644 --- a/include/linux/fs_parser.h +++ b/include/linux/fs_parser.h @@ -134,6 +134,7 @@ static inline bool fs_validate_description(const char *name, #define fsparam_bdev(NAME, OPT) __fsparam(fs_param_is_blockdev, NAME, OPT, 0, NULL) #define fsparam_path(NAME, OPT) __fsparam(fs_param_is_path, NAME, OPT, 0, NULL) #define fsparam_fd(NAME, OPT) __fsparam(fs_param_is_fd, NAME, OPT, 0, NULL) +#define fsparam_path_fd(NAME, OPT) __fsparam(fs_param_is_path_fd, NAME, OPT, 0, NULL) #define fsparam_file_or_string(NAME, OPT) \ __fsparam(fs_param_is_file_or_string, NAME, OPT, 0, NULL) #define fsparam_uid(NAME, OPT) __fsparam(fs_param_is_uid, NAME, OPT, 0, NULL) and so that we don't break code and autofs FSCONFIG_SET_FD usage. Both want non O_PATH fds. But otherwise I don't see an issue with this.