Add a manual page to document the fsopen(), fspick() and fsmount() system calls. Signed-off-by: David Howells <dhowells@xxxxxxxxxx> --- man2/fsmount.2 | 1 man2/fsopen.2 | 357 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ man2/fspick.2 | 1 3 files changed, 359 insertions(+) create mode 100644 man2/fsmount.2 create mode 100644 man2/fsopen.2 create mode 100644 man2/fspick.2 diff --git a/man2/fsmount.2 b/man2/fsmount.2 new file mode 100644 index 000000000..2bf59fc3e --- /dev/null +++ b/man2/fsmount.2 @@ -0,0 +1 @@ +.so man2/fsopen.2 diff --git a/man2/fsopen.2 b/man2/fsopen.2 new file mode 100644 index 000000000..1bc761ab4 --- /dev/null +++ b/man2/fsopen.2 @@ -0,0 +1,357 @@ +'\" t +.\" Copyright (c) 2018 David Howells <dhowells@xxxxxxxxxx> +.\" +.\" %%%LICENSE_START(VERBATIM) +.\" Permission is granted to make and distribute verbatim copies of this +.\" manual provided the copyright notice and this permission notice are +.\" preserved on all copies. +.\" +.\" Permission is granted to copy and distribute modified versions of this +.\" manual under the conditions for verbatim copying, provided that the +.\" entire resulting derived work is distributed under the terms of a +.\" permission notice identical to this one. +.\" +.\" Since the Linux kernel and libraries are constantly changing, this +.\" manual page may be incorrect or out-of-date. The author(s) assume no +.\" responsibility for errors or omissions, or for damages resulting from +.\" the use of the information contained herein. The author(s) may not +.\" have taken the same level of care in the production of this manual, +.\" which is licensed free of charge, as they might when working +.\" professionally. +.\" +.\" Formatted or processed versions of this manual, if unaccompanied by +.\" the source, must acknowledge the copyright and authors of this work. +.\" %%%LICENSE_END +.\" +.TH FSOPEN 2 2018-06-07 "Linux" "Linux Programmer's Manual" +.SH NAME +fsopen, fsmount, fspick \- Handle filesystem (re-)configuration and mounting +.SH SYNOPSIS +.nf +.B #include <sys/types.h> +.br +.B #include <sys/mount.h> +.br +.B #include <unistd.h> +.br +.BR "#include <fcntl.h> " "/* Definition of AT_* constants */" +.PP +.BI "int fsopen(const char *" fsname ", unsigned int " flags ); +.PP +.BI "int fsmount(int " fd ", unsigned int " flags ", unsigned int " ms_flags ); +.PP +.BI "int fspick(int " dirfd ", const char *" pathname ", unsigned int " flags ); +.fi +.PP +.IR Note : +There are no glibc wrappers for these system calls. +.SH DESCRIPTION +.PP +.BR fsopen () +creates a new filesystem configuration context within the kernel for the +filesystem named in the +.I fsname +parameter and attaches it to a file descriptor, which it then returns. The +file descriptor can be marked close-on-exec by setting +.B FSOPEN_CLOEXEC +in flags. +.PP +The +file descriptor can then be used to configure the desired filesystem parameters +and security parameters by using +.BR write (2) +to pass parameters to it and then writing a command to actually create the +filesystem representation. +.PP +The file descriptor also serves as a channel by which more comprehensive error, +warning and information messages may be retrieved from the kernel using +.BR read (2). +.PP +Once the kernel's filesystem representation has been created, it can be queried +by calling +.BR fsinfo (2) +on the file descriptor. fsinfo() will spot that the target is actually a +creation context and look inside that. +.PP +.BR fsmount () +can then be called to create a mount object that refers to the newly created +filesystem representation, with the propagation and mount restrictions to be +applied specified in +.IR ms_flags . +The mount object is then attached to a new file descriptor that looks like one +created by +.BR open "(2) with " O_PATH " or " open_tree (2). +This can be passed to +.BR move_mount (2) +to attach the mount object to a mountpoint, thereby completing the process. +.PP +The file descriptor returned by fsmount() is marked close-on-exec if +FSMOUNT_CLOEXEC is specified in +.IR flags . +.PP +After fsmount() has completed, the context created by fsopen() is reset and +moved to reconfiguration state, allowing the new superblock to be reconfigured. +.PP +.BR fspick () +creates a new filesystem context within the kernel, attaches the superblock +specified by +.IR dfd ", " pathname ", " flags +and puts it into the reconfiguration state and attached the context to a new +file descriptor that can then be parameterised with +.BR write (2) +exactly the same as for the context created by fsopen() above. +.PP +.I flags +is an OR'd together mask of +.B FSPICK_CLOEXEC +which indicates that the returned file descriptor should be marked +close-on-exec and +.BR FSPICK_SYMLINK_NOFOLLOW ", " FSPICK_NO_AUTOMOUNT " and " FSPICK_EMPTY_PATH +which control the pathwalk to the target object (see below). + +.\"________________________________________________________ +.SS Writable Command Interface +Superblock (re-)configuration is achieved by writing command strings to the +context file descriptor using +.BR write (2). +Each string is prefixed with a specifier indicating the class of command +being specified. The available commands include: +.TP +\fB"o <option>"\fP +Specify a filesystem or security parameter. +.I <option> +is typically a key or key=val format string. Since the length of the option is +given to write(), the option may include any sort of character, including +spaces and commas or even binary data. +.TP +\fB"s <name>"\fP +Specify a device file, network server or other other source specification. +This may be optional, depending on the filesystem, and it may be possible to +provide multiple of them to a filesystem. +.TP +\fB"x create"\fP +End the filesystem configuration phase and try and create a representation in +the kernel with the parameters specified. After this, the context is shifted +to the mount-pending state waiting for an fsmount() call to occur. +.TP +\fB"x reconfigure"\fP +End a filesystem reconfiguration phase try to apply the parameters to the +filesystem representation. After this, the context gets reset and put back to +the start of the reconfiguration phase again. +.PP +With this interface, option strings are not limited to 4096 bytes, either +individually or in sum, and they are also not restricted to text-only options. +Further, errors may be given individually for each option and not aggregated or +dumped into the kernel log. + +.\"________________________________________________________ +.SS Message Retrieval Interface +The context file descriptor may be queried for message strings at any time by +calling +.BR read (2) +on the file descriptor. This will return formatted messages that are prefixed +to indicate their class: +.TP +\fB"e <message>"\fP +An error message string was logged. +.TP +\fB"i <message>"\fP +An informational message string was logged. +.TP +\fB"w <message>"\fP +An warning message string was logged. +.PP +Messages are removed from the queue as they're read. + +.\"________________________________________________________ +.SH EXAMPLES +To illustrate the process, here's an example whereby this can be used to mount +an ext4 filesystem on /dev/sdb1 onto /mnt. Note that the example ignores the +fact that +.BR write (2) +has a length parameter and that errors might occur. +.PP +.in +4n +.nf +sfd = fsopen("ext4", FSOPEN_CLOEXEC); +write(sfd, "s /dev/sdb1"); +write(sfd, "o noatime"); +write(sfd, "o acl"); +write(sfd, "o user_attr"); +write(sfd, "o iversion"); +write(sfd, "x create"); +fsinfo(sfd, NULL, ...); +mfd = fsmount(sfd, FSMOUNT_CLOEXEC, MS_RELATIME); +move_mount(mfd, "", sfd, AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH); +.fi +.in +.PP +Here, an ext4 context is created first and attached to sfd. This is then told +where its source will be, given a bunch of options and created. +.BR fsinfo (2) +can then be used to query the filesystem. Then fsmount() is called to create a +mount object and +.BR move_mount (2) +is called to attach it to its intended mountpoint. +.PP +And here's an example of mounting from an NFS server: +.PP +.in +4n +.nf +sfd = fsopen("nfs", 0); +write(sfd, "s example.com/pub/linux"); +write(sfd, "o nfsvers=3"); +write(sfd, "o rsize=65536"); +write(sfd, "o wsize=65536"); +write(sfd, "o rdma"); +write(sfd, "x create"); +mfd = fsmount(sfd, 0, MS_NODEV); +move_mount(mfd, "", sfd, AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH); +.fi +.in +.PP +Reconfiguration can be achieved by: +.PP +.in +4n +.nf +sfd = fspick(AT_FDCWD, "/mnt", FSPICK_NO_AUTOMOUNT | FSPICK_CLOEXEC); +write(sfd, "o ro"); +write(sfd, "x reconfigure"); +.fi +.in +.PP +or: +.PP +.in +4n +.nf +sfd = fsopen(...); +... +mfd = fsmount(sfd, ...); +... +write(sfd, "o ro"); +write(sfd, "x reconfigure"); +.fi +.in + + +.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" +.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" +.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" +.SH RETURN VALUE +On success, all three functions return a file descriptor. On error, \-1 is +returned, and +.I errno +is set appropriately. +.SH ERRORS +The error values given below result from filesystem type independent +errors. +Each filesystem type may have its own special errors and its +own special behavior. +See the Linux kernel source code for details. +.TP +.B EACCES +A component of a path was not searchable. +(See also +.BR path_resolution (7).) +.TP +.B EACCES +Mounting a read-only filesystem was attempted without giving the +.B MS_RDONLY +flag. +.TP +.B EACCES +The block device +.I source +is located on a filesystem mounted with the +.B MS_NODEV +option. +.\" mtk: Probably: write permission is required for MS_BIND, with +.\" the error EPERM if not present; CAP_DAC_OVERRIDE is required. +.TP +.B EBUSY +.I source +cannot be reconfigured read-only, because it still holds files open for +writing. +.TP +.B EFAULT +One of the pointer arguments points outside the user address space. +.TP +.B EINVAL +.I source +had an invalid superblock. +.TP +.B EINVAL +.I ms_flags +includes more than one of +.BR MS_SHARED , +.BR MS_PRIVATE , +.BR MS_SLAVE , +or +.BR MS_UNBINDABLE . +.TP +.BR EINVAL +An attempt was made to bind mount an unbindable mount. +.TP +.B ELOOP +Too many links encountered during pathname resolution. +.TP +.B EMFILE +The system has too many open files to create more. +.TP +.B ENFILE +The process has too many open files to create more. +.TP +.B ENAMETOOLONG +A pathname was longer than +.BR MAXPATHLEN . +.TP +.B ENODEV +Filesystem +.I fsname +not configured in the kernel. +.TP +.B ENOENT +A pathname was empty or had a nonexistent component. +.TP +.B ENOMEM +The kernel could not allocate sufficient memory to complete the call. +.TP +.B ENOTBLK +.I source +is not a block device (and a device was required). +.TP +.B ENOTDIR +.IR pathname , +or a prefix of +.IR source , +is not a directory. +.TP +.B ENXIO +The major number of the block device +.I source +is out of range. +.TP +.B EPERM +The caller does not have the required privileges. +.SH CONFORMING TO +These functions are Linux-specific and should not be used in programs intended +to be portable. +.SH VERSIONS +.BR fsopen "(), " fsmount "() and " fspick () +were added to Linux in kernel 4.18. +.SH NOTES +Glibc does not (yet) provide a wrapper for the +.BR fsopen "() , " fsmount "() or " fspick "()" +system calls; call them using +.BR syscall (2). +.SH SEE ALSO +.BR mountpoint (1), +.BR move_mount (2), +.BR open_tree (2), +.BR umount (2), +.BR mount_namespaces (7), +.BR path_resolution (7), +.BR findmnt (8), +.BR lsblk (8), +.BR mount (8), +.BR umount (8) diff --git a/man2/fspick.2 b/man2/fspick.2 new file mode 100644 index 000000000..2bf59fc3e --- /dev/null +++ b/man2/fspick.2 @@ -0,0 +1 @@ +.so man2/fsopen.2