Re: [PATCH v2 2/2] rust: miscdevice: add base miscdevice abstraction

Christian Brauner <brauner@xxxxxxxxxx> · Wed, 2 Oct 2024 16:23:33 +0200

On Wed, Oct 02, 2024 at 03:36:33PM GMT, Alice Ryhl wrote:
> On Wed, Oct 2, 2024 at 3:24 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> >
> > On Wed, Oct 02, 2024 at 12:48:12PM GMT, Arnd Bergmann wrote:
> > > On Tue, Oct 1, 2024, at 08:22, Alice Ryhl wrote:
> > > > +#[cfg(CONFIG_COMPAT)]
> > > > +unsafe extern "C" fn fops_compat_ioctl<T: MiscDevice>(
> > > > +    file: *mut bindings::file,
> > > > +    cmd: c_uint,
> > > > +    arg: c_ulong,
> > > > +) -> c_long {
> > > > +    // SAFETY: The compat ioctl call of a file can access the private
> > > > data.
> > > > +    let private = unsafe { (*file).private_data };
> > > > +    // SAFETY: Ioctl calls can borrow the private data of the file.
> > > > +    let device = unsafe { <T::Ptr as ForeignOwnable>::borrow(private)
> > > > };
> > > > +
> > > > +    match T::compat_ioctl(device, cmd as u32, arg as usize) {
> > > > +        Ok(ret) => ret as c_long,
> > > > +        Err(err) => err.to_errno() as c_long,
> > > > +    }
> > > > +}
> > >
> > > I think this works fine as a 1:1 mapping of the C API, so this
> > > is certainly something we can do. On the other hand, it would be
> > > nice to improve the interface in some way and make it better than
> > > the C version.
> > >
> > > The changes that I think would be straightforward and helpful are:
> > >
> > > - combine native and compat handlers and pass a flag argument
> > >   that the callback can check in case it has to do something
> > >   special for compat mode
> > >
> > > - pass the 'arg' value as both a __user pointer and a 'long'
> > >   value to avoid having to cast. This specifically simplifies
> > >   the compat version since that needs different types of
> > >   64-bit extension for incoming 32-bit values.
> > >
> > > On top of that, my ideal implementation would significantly
> > > simplify writing safe ioctl handlers by using the information
> > > encoded in the command word:
> > >
> > >  - copy the __user data into a kernel buffer for _IOW()
> > >    and back for _IOR() type commands, or both for _IOWR()
> > >  - check that the argument size matches the size of the
> > >    structure it gets assigned to
> >
> > - Handle versioning by size for ioctl()s correctly so stuff like:
> >
> >         /* extensible ioctls */
> >         switch (_IOC_NR(ioctl)) {
> >         case _IOC_NR(NS_MNT_GET_INFO): {
> >                 struct mnt_ns_info kinfo = {};
> >                 struct mnt_ns_info __user *uinfo = (struct mnt_ns_info __user *)arg;
> >                 size_t usize = _IOC_SIZE(ioctl);
> >
> >                 if (ns->ops->type != CLONE_NEWNS)
> >                         return -EINVAL;
> >
> >                 if (!uinfo)
> >                         return -EINVAL;
> >
> >                 if (usize < MNT_NS_INFO_SIZE_VER0)
> >                         return -EINVAL;
> >
> >                 return copy_ns_info_to_user(to_mnt_ns(ns), uinfo, usize, &kinfo);
> >         }
> >
> > This is not well-known and noone versions ioctl()s correctly and if they
> > do it's their own hand-rolled thing. Ideally, this would be a first
> > class concept with Rust bindings and versioning like this would be
> > universally enforced.
> 
> Could you point me at some more complete documentation or example of
> how to correctly do versioning?

So I don't want you to lead astray so if this is out of reach for now I
understand but basically we do have the concept of versioning structs by
size.

So I'm taking an example from the mount_setattr() man page though
openat2() would also work:

   Extensibility
       In order to allow for future extensibility, mount_setattr()
       requires the user-space application to specify the size of the
       mount_attr structure that it is passing.  By  providing  this
       information,  it  is  possible for mount_setattr() to provide
       both forwards- and backwards-compatibility, with size acting as
       an implicit version number.  (Because new extension fields will
       always be appended, the structure size will always increase.)
       This extensibility design is very similar  to  other  system
       calls  such  as  perf_setattr(2),  perf_event_open(2), clone3(2)
       and openat2(2).

       Let  usize  be the size of the structure as specified by the
       user-space application, and let ksize be the size of the
       structure which the kernel supports, then there are three cases
       to consider:

       •  If ksize equals usize, then there is no version mismatch and
          attr can be used verbatim.

       •  If ksize is larger than usize, then there are some extension
	  fields that the kernel supports which the user-space
	  application is unaware of.  Because a zero value in any  added
	  extension field signifies a no-op, the kernel treats all of
	  the extension fields not provided by the user-space
	  application as having zero values.  This provides
	  backwards-compatibility.

       •  If ksize is smaller than usize, then there are some extension
	  fields which the user-space application is aware of but which
	  the kernel does not support.  Because any extension field must
	  have  its  zero  values signify a no-op, the kernel can safely
	  ignore the unsupported extension fields if they are all zero.
	  If any unsupported extension fields are non-zero, then -1 is
	  returned and errno is set to E2BIG.  This provides
	  forwards-compatibility.

   [...]

In essence ioctl()s are already versioned by size because the size of
the passed argument is encoded in the ioctl cmd:

struct my_struct {
	__u64 a;
};

ioctl(fd, MY_IOCTL, &my_struct);

then _IOC_SIZE(MY_IOCTL) will give you the expected size.

If the kernel extends the struct to:

struct my_struct {
	__u64 a;
	__u64 b;
};

then the value of MY_IOCTL changes. Most code currently cannot deal with
such an extension because it's coded as a simple switch on the ioctl
command:

switch (cmd) {
	case MY_IOCTL:
		/* do something */
		break;
}

So on an older kernel the ioctl would now fail because it won't be able
to handle MY_STRUCT with an increased struct my_struct size because the
switch wouldn't trigger.

The correct way to handle this is to grab the actual ioctl number out
from the ioctl command:

switch (_IOC_NR(cmd)) {
        case _IOC_NR(MY_STRUCT): {

and then grab the size of the ioctl:

        size_t usize = _IOC_SIZE(ioctl);

perform sanity checks:

	// garbage
        if (usize < MY_STRUCT_SIZE_VER0)
                return -EINVAL;

	// ¿qué?
	if (usize > PAGE_SIZE)
		return -EINVAL;

and then copy the stuff via copy_struct_from_user() or copy back out to
user via other means.

This way you can safely extend ioctl()s in a backward and forward
compatible manner and if we can enforce this for new drivers then I
think that's what we should do.