Re: [PATCH v2 27/27] Documentation: document ioctl interfaces better

Arnd Bergmann <arnd@xxxxxxxx> · Thu, 19 Dec 2019 21:05:13 +0100

On Wed, Dec 18, 2019 at 11:45 PM Ben Hutchings
<ben.hutchings@xxxxxxxxxxxxxxx> wrote:
> On Tue, 2019-12-17 at 23:17 +0100, Arnd Bergmann wrote:
> > --- /dev/null
> > +++ b/Documentation/core-api/ioctl.rst
> > +``include/uapi/asm-generic/ioctl.h`` provides four macros for defining
> > +ioctl commands that follow modern conventions: ``_IO``, ``_IOR``,
> > +``_IOW``, and ``_IORW``. These should be used for all new commands,
>
> Typo: "_IORW" should be "_IOWR".

Fixed now

> > +with the correct parameters:
> > +
> > +_IO/_IOR/_IOW/_IOWR
> > +   The macro name determines whether the argument is used for passing
> > +   data into kernel (_IOW), from the kernel (_IOR), both (_IOWR) or is
> > +   not a pointer (_IO). It is possible but not recommended to pass an
> > +   integer value instead of a pointer with _IO.
>
> I feel the explanation of _IO here could be confusing.  I think what
> you meant to say was that it is possible, but not recommended, to pass
> integers directly (arg is integer) rather than indirectly (arg is
> pointer to integer).  I suggest the alternate wording:
>
> The macro name specifies how the argument will be used.  It may be a
> pointer to data to be passed into the kernel (_IOW), out of the kernel
> (_IOR), or both (_IOWR).  The argument may also be an integer value
> instead of a pointer (_IO), but this is not recommended.

That's probably better than my version, but I find that misleading as well:
it sounds like _IO() is not recommended, but having no argument with
_IO() is actually fine. This is what I have now:

   The macro name specifies how the argument will be used.  It may be a
   pointer to data to be passed into the kernel (_IOW), out of the kernel
   (_IOR), or both (_IOWR).  _IO can indicate either commands with no
   argument or those passing an integer value instead of a pointer.
   It is recommended to only use _IO for commands without arguments,
   and use pointers for passing data.

> > +data_type
> > +  The name of the data type pointed to by the argument, the command number
> > +  encodes the ``sizeof(data_type)`` value in a 13-bit or 14-bit integer,
> > +  leading to a limit of 8191 bytes for the maximum size of the argument.
> > +  Note: do not pass sizeof(data_type) type into _IOR/IOW, as that will
> > +  lead to encoding sizeof(sizeof(data_type)), i.e. sizeof(size_t).
>
> You left out _IOWR here.  It might also be worth mentioning that _IO
> doesn't have this parameter.

Changed now.

> [...]
> > +Return code
> > +===========
> > +
> > +ioctl commands can return negative error codes as documented in errno(3),
> > +these get turned into errno values in user space.
>
> Use a semi-colon instead of a comma, or change "these" to "which".

done

> > On success, the return
> > +code should be zero. It is also possible but not recommended to return
> > +a positive 'long' value.
> > +
> > +When the ioctl callback is called with an unknown command number, the
> > +handler returns either -ENOTTY or -ENOIOCTLCMD, which also results in
> > +-ENOTTY being returned from the system call. Some subsystems return
> > +-ENOSYS or -EINVAL here for historic reasons, but this is wrong.
> > +
> > +Prior to Linux-5.5, compat_ioctl handlers were required to return
>
> Space instead of hyphen.

done

> > +-ENOIOCTLCMD in order to use the fallback conversion into native
> > +commands. As all subsystems are now responsible for handling compat
> > +mode themselves, this is no longer needed, but it may be important to
> > +consider when backporting bug fixes to older kernels.
> > +
> > +Timestamps
> > +==========
> > +
> > +Traditionally, timestamps and timeout values are passed as ``struct
> > +timespec`` or ``struct timeval``, but these are problematic because of
> > +incompatible definitions of these structures in user space after the
> > +move to 64-bit time_t.
> > +
> > +The __kernel_timespec type can be used instead to be embedded in other
>
> It's not a typedef, so ``struct __kernel_timespec``.

done

> [...]
> > +32-bit compat mode
> > +==================
> > +
> > +In order to support 32-bit user space running on a 64-bit machine, each
> > +subsystem or driver that implements an ioctl callback handler must also
> > +implement the corresponding compat_ioctl handler.
> > +
> > +As long as all the rules for data structures are followed, this is as
> > +easy as setting the .compat_ioctl pointer to a helper function such as
> > +compat_ptr_ioctl() or blkdev_compat_ptr_ioctl().
> > +
> > +compat_ptr()
> > +------------
> > +
> > +On the s/390 architecture, 31-bit user space has ambiguous representations
>
> IBM never used the name "S/390" for the 64-bit mainframe architecture,
> but they have rebranded it several times.  Rather than trying to follow
> what it's called this year, maybe just write "s390" to match what we
> usually call it?

ok, done

> > +
> > +  has four bytes of padding between a and b on x86-64, plus another four
> > +  bytes of padding at the end, but no padding on i386, and it needs a
> > +  compat_ioctl conversion handler to translate between the two formats.
> > +
> > +  To avoid this problem, all structures should have their members
> > +  naturally aligned, or explicit reserved fields added in place of the
> > +  implicit padding.
>
> This should explain how to check that - presumably by running pahole on
> some sensible architecture.

Ok, added "The ``pahole`` tool can be used for checking the alignment.".

> > +* On ARM OABI user space, 16-bit member variables have 32-bit
> > +  alignment, making them incompatible with modern EABI kernels.
>
> I thought that OABI required structures as a whole to have alignment of
> 4, not individual members?  Which obviously does affect small
> structures as members of other structures.

You are right, I clearly misunderstood that. Changed the paragraph now to

* On ARM OABI user space, structures are padded to multiples of 32-bit,
  making some structs incompatible with modern EABI kernels if they
  do not end on a 32-bit boundary.

* On the m68k architecture, struct members are not guaranteed to have an
  alignment greater than 16-bit, which is a problem when relying on
  implicit padding.

> [...]
> > +Information leaks
> > +=================
> > +
> > +Uninitialized data must not be copied back to user space, as this can
> > +cause an information leak, which can be used to defeat kernel address
> > +space layout randomization (KASLR), helping in an attack.
> > +
> > +As explained for the compat mode, it is best to not avoid any implicit
>
> Delete "not".

Done.

 > +padding in data structures, but if there is already padding in existing
> > +structures, the kernel driver must be careful to zero out the padding
> > +using memset() or similar before copying it to user space.
>
> This sentence is rather too long.  Also it can be read as suggesting
> that one should somehow identify and memset() the padding just before
> copying to user-space.  I suggest an alternate wording:
>
> For this reason (and for compat support) it is best to avoid any
> implicit padding in data structures.  Where there is implicit padding
> in an existing structure, kernel drivers must be careful to fully
> initialize an instance of the structure before copying it to user
> space.  This is usually done by calling memset() before assigning to
> individual members.

Sounds good, I've taken that paragraph now.

> [...]
> > +Alternatives to ioctl
> > +=====================
> [...]
> > +* A custom file system can provide extra flexibility with a simple
> > +  user interface but add a lot of complexity to the implementation.
>
> Typo: "add" should be "adds".

Fixed

Thanks for all the good suggestions!

      Arnd