Re: [RFC PATCH] fs: Add vfs_masks_device_ioctl*() helpers

"Günther Noack" <gnoack@xxxxxxxxxx> · Tue, 5 Mar 2024 19:13:33 +0100

Hello!

More questions than answers in this code review, but maybe this discusison will
help to get a clearer picture about what we are going for here.

On Mon, Feb 19, 2024 at 07:35:39PM +0100, Mickaël Salaün wrote:
> vfs_masks_device_ioctl() and vfs_masks_device_ioctl_compat() are useful
> to differenciate between device driver IOCTL implementations and
> filesystem ones.  The goal is to be able to filter well-defined IOCTLs
> from per-device (i.e. namespaced) IOCTLs and control such access.
> 
> Add a new ioctl_compat() helper, similar to vfs_ioctl(), to wrap
> compat_ioctl() calls and handle error conversions.
> 
> Cc: Arnd Bergmann <arnd@xxxxxxxx>
> Cc: Christian Brauner <brauner@xxxxxxxxxx>
> Cc: Günther Noack <gnoack@xxxxxxxxxx>
> ---
>  fs/ioctl.c         | 101 +++++++++++++++++++++++++++++++++++++++++----
>  include/linux/fs.h |  12 ++++++
>  2 files changed, 105 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/ioctl.c b/fs/ioctl.c
> index 76cf22ac97d7..f72c8da47d21 100644
> --- a/fs/ioctl.c
> +++ b/fs/ioctl.c
> @@ -763,6 +763,38 @@ static int ioctl_fssetxattr(struct file *file, void __user *argp)
>  	return err;
>  }
>  
> +/*
> + * Safeguard to maintain a list of valid IOCTLs handled by do_vfs_ioctl()
> + * instead of def_blk_fops or def_chr_fops (see init_special_inode).
> + */
> +__attribute_const__ bool vfs_masked_device_ioctl(const unsigned int cmd)
> +{
> +	switch (cmd) {
> +	case FIOCLEX:
> +	case FIONCLEX:
> +	case FIONBIO:
> +	case FIOASYNC:
> +	case FIOQSIZE:
> +	case FIFREEZE:
> +	case FITHAW:
> +	case FS_IOC_FIEMAP:
> +	case FIGETBSZ:
> +	case FICLONE:
> +	case FICLONERANGE:
> +	case FIDEDUPERANGE:
> +	/* FIONREAD is forwarded to device implementations. */
> +	case FS_IOC_GETFLAGS:
> +	case FS_IOC_SETFLAGS:
> +	case FS_IOC_FSGETXATTR:
> +	case FS_IOC_FSSETXATTR:
> +	/* file_ioctl()'s IOCTLs are forwarded to device implementations. */
> +		return true;
> +	default:
> +		return false;
> +	}
> +}
> +EXPORT_SYMBOL(vfs_masked_device_ioctl);

[
Technical implementation notes about this function: the list of IOCTLs here are
the same ones which do_vfs_ioctl() implements directly.

There are only two cases in which do_vfs_ioctl() does more complicated handling:

(1) FIONREAD falls back to the device's ioctl implemenetation.
    Therefore, we omit FIONREAD in our own list - we do not want to allow that.
(2) The default case falls back to the file_ioctl() function, but *only* for
    S_ISREG() files, so it does not matter for the Landlock case.
]

## What we are actually trying to do (?)

Let me try to take a step back and paraphrase what I think we are *actually*
trying to do here -- please correct me if I am wrong about that:

I think what we *really* are trying to do is to control from the Landlock LSM
whether the filp->f_op->unlocked_ioctl() or filp->f_op->ioctl_compat()
operations are getting called for device files.

So in a world where we cared only about correctness, we could create a new LSM
hook security_file_vfs_ioctl(), which gets checked just before these two f_op
operations get called.  With that, we could permit all IOCTLs that are
implemented in fs/ioctl.c, and we could deny all IOCTL commands that are
implemented in the device implementation.

I guess the reasons why we are not using that approach are performance, and that
it might mess up the LSM hook interface with special cases that only Landlcok
needs?  But it seems like it would be easier to reason about..?  Or maybe we can
find a middle ground, where we have the existing hook return a special value
with the meaning "permit this IOCTL, but do not invoke the f_op hook"?

## What we implemented

Of course, the existing security_file_ioctl LSM hook works differently, and so
with that hook, we need to make our blocking decision purely based on the struct
file*, the IOCTL command number and the IOCTL argument.

So in order to make that decision correctly based on that information, we end up
listing all the IOCTLs which are directly(!) implemented in do_vfs_ioctl(),
because for Landlock, this is the list of IOCTL commands which is safe to permit
on device files.  And we need to keep that list in sync with fs/ioctl.c, which
is why it ended up in the same place in this commit.

(Is it maybe possible to check with a KUnit test whether such lists are in sync?
It sounds superficially like it should be feasible to create a device file which
records whether its ioctl implementation was called.  So we could at least check
that the Landlock command list is a subset of the do_vfs_ioctl() one.)

> +
>  /*
>   * do_vfs_ioctl() is not for drivers and not intended to be EXPORT_SYMBOL()'d.
>   * It's just a simple helper for sys_ioctl and compat_sys_ioctl.
> @@ -858,6 +890,8 @@ SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd, unsigned long, arg)
>  {
>  	struct fd f = fdget(fd);
>  	int error;
> +	const struct inode *inode;
> +	bool is_device;
>  
>  	if (!f.file)
>  		return -EBADF;
> @@ -866,9 +900,18 @@ SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd, unsigned long, arg)
>  	if (error)
>  		goto out;
>  
> +	inode = file_inode(f.file);
> +	is_device = S_ISBLK(inode->i_mode) || S_ISCHR(inode->i_mode);
> +	if (is_device && !vfs_masked_device_ioctl(cmd)) {
> +		error = vfs_ioctl(f.file, cmd, arg);
> +		goto out;
> +	}
> +
>  	error = do_vfs_ioctl(f.file, fd, cmd, arg);
> -	if (error == -ENOIOCTLCMD)
> +	if (error == -ENOIOCTLCMD) {
> +		WARN_ON_ONCE(is_device);
>  		error = vfs_ioctl(f.file, cmd, arg);
> +	}

It is not obvious at first that adding this list requires a change to the ioctl
syscall implementations.  If I understand this right, the idea is that you want
to be 100% sure that we are not calling vfs_ioctl() for the commands in that
list.  And there is a scenario where this could potentially happen:

do_vfs_ioctl() implements most things like this:

static int do_vfs_ioctl(...) {
	switch (cmd) {
	/* many cases like the following: */
	case FITHAW:
		return ioctl_fsthaw(filp);
	/* ... */
	}
	return -ENOIOCTLCMD;
}

So I believe the scenario you want to avoid is the one where ioctl_fsthaw() or
one of the other functions return -ENOIOCTLCMD by accident, and where that will
then make the surrounding syscall implementation fall back to vfs_ioctl()
despite the cmd being listed as safe for Landlock?  Is that right?

Looking at do_vfs_ioctl() and its helper functions, I am getting the impression
that -ENOIOCTLCMD is only supposed to be returned at the very end of it, but not
by any of the helper functions?  If that were the case, we could maybe just as
well just solve that problem local to do_vfs_ioctl()?

A bit inelegant maybe, but just to get the idea across:

static int sanitize_enoioctlcmd(int res) {
	if (res == -ENOIOCTLCMD)
		return ENOTTY;
	return res;
}

static int do_vfs_ioctl(...) {
	switch (cmd) {
	/* many cases like the following: */
	case FITHAW:
		return sanitize_enoioctlcmd(ioctl_fsthaw(filp));
	/* ... */
	}
	return -ENOIOCTLCMD;
}

Would that be better?

—Günther