On Wed, Dec 18, 2019 at 11:45 PM Ben Hutchings <ben.hutchings@xxxxxxxxxxxxxxx> wrote: > On Tue, 2019-12-17 at 23:17 +0100, Arnd Bergmann wrote: > > --- /dev/null > > +++ b/Documentation/core-api/ioctl.rst > > +``include/uapi/asm-generic/ioctl.h`` provides four macros for defining > > +ioctl commands that follow modern conventions: ``_IO``, ``_IOR``, > > +``_IOW``, and ``_IORW``. These should be used for all new commands, > > Typo: "_IORW" should be "_IOWR". Fixed now > > +with the correct parameters: > > + > > +_IO/_IOR/_IOW/_IOWR > > + The macro name determines whether the argument is used for passing > > + data into kernel (_IOW), from the kernel (_IOR), both (_IOWR) or is > > + not a pointer (_IO). It is possible but not recommended to pass an > > + integer value instead of a pointer with _IO. > > I feel the explanation of _IO here could be confusing. I think what > you meant to say was that it is possible, but not recommended, to pass > integers directly (arg is integer) rather than indirectly (arg is > pointer to integer). I suggest the alternate wording: > > The macro name specifies how the argument will be used. It may be a > pointer to data to be passed into the kernel (_IOW), out of the kernel > (_IOR), or both (_IOWR). The argument may also be an integer value > instead of a pointer (_IO), but this is not recommended. That's probably better than my version, but I find that misleading as well: it sounds like _IO() is not recommended, but having no argument with _IO() is actually fine. This is what I have now: The macro name specifies how the argument will be used. It may be a pointer to data to be passed into the kernel (_IOW), out of the kernel (_IOR), or both (_IOWR). _IO can indicate either commands with no argument or those passing an integer value instead of a pointer. It is recommended to only use _IO for commands without arguments, and use pointers for passing data. > > +data_type > > + The name of the data type pointed to by the argument, the command number > > + encodes the ``sizeof(data_type)`` value in a 13-bit or 14-bit integer, > > + leading to a limit of 8191 bytes for the maximum size of the argument. > > + Note: do not pass sizeof(data_type) type into _IOR/IOW, as that will > > + lead to encoding sizeof(sizeof(data_type)), i.e. sizeof(size_t). > > You left out _IOWR here. It might also be worth mentioning that _IO > doesn't have this parameter. Changed now. > [...] > > +Return code > > +=========== > > + > > +ioctl commands can return negative error codes as documented in errno(3), > > +these get turned into errno values in user space. > > Use a semi-colon instead of a comma, or change "these" to "which". done > > On success, the return > > +code should be zero. It is also possible but not recommended to return > > +a positive 'long' value. > > + > > +When the ioctl callback is called with an unknown command number, the > > +handler returns either -ENOTTY or -ENOIOCTLCMD, which also results in > > +-ENOTTY being returned from the system call. Some subsystems return > > +-ENOSYS or -EINVAL here for historic reasons, but this is wrong. > > + > > +Prior to Linux-5.5, compat_ioctl handlers were required to return > > Space instead of hyphen. done > > +-ENOIOCTLCMD in order to use the fallback conversion into native > > +commands. As all subsystems are now responsible for handling compat > > +mode themselves, this is no longer needed, but it may be important to > > +consider when backporting bug fixes to older kernels. > > + > > +Timestamps > > +========== > > + > > +Traditionally, timestamps and timeout values are passed as ``struct > > +timespec`` or ``struct timeval``, but these are problematic because of > > +incompatible definitions of these structures in user space after the > > +move to 64-bit time_t. > > + > > +The __kernel_timespec type can be used instead to be embedded in other > > It's not a typedef, so ``struct __kernel_timespec``. done > [...] > > +32-bit compat mode > > +================== > > + > > +In order to support 32-bit user space running on a 64-bit machine, each > > +subsystem or driver that implements an ioctl callback handler must also > > +implement the corresponding compat_ioctl handler. > > + > > +As long as all the rules for data structures are followed, this is as > > +easy as setting the .compat_ioctl pointer to a helper function such as > > +compat_ptr_ioctl() or blkdev_compat_ptr_ioctl(). > > + > > +compat_ptr() > > +------------ > > + > > +On the s/390 architecture, 31-bit user space has ambiguous representations > > IBM never used the name "S/390" for the 64-bit mainframe architecture, > but they have rebranded it several times. Rather than trying to follow > what it's called this year, maybe just write "s390" to match what we > usually call it? ok, done > > + > > + has four bytes of padding between a and b on x86-64, plus another four > > + bytes of padding at the end, but no padding on i386, and it needs a > > + compat_ioctl conversion handler to translate between the two formats. > > + > > + To avoid this problem, all structures should have their members > > + naturally aligned, or explicit reserved fields added in place of the > > + implicit padding. > > This should explain how to check that - presumably by running pahole on > some sensible architecture. Ok, added "The ``pahole`` tool can be used for checking the alignment.". > > +* On ARM OABI user space, 16-bit member variables have 32-bit > > + alignment, making them incompatible with modern EABI kernels. > > I thought that OABI required structures as a whole to have alignment of > 4, not individual members? Which obviously does affect small > structures as members of other structures. You are right, I clearly misunderstood that. Changed the paragraph now to * On ARM OABI user space, structures are padded to multiples of 32-bit, making some structs incompatible with modern EABI kernels if they do not end on a 32-bit boundary. * On the m68k architecture, struct members are not guaranteed to have an alignment greater than 16-bit, which is a problem when relying on implicit padding. > [...] > > +Information leaks > > +================= > > + > > +Uninitialized data must not be copied back to user space, as this can > > +cause an information leak, which can be used to defeat kernel address > > +space layout randomization (KASLR), helping in an attack. > > + > > +As explained for the compat mode, it is best to not avoid any implicit > > Delete "not". Done. > +padding in data structures, but if there is already padding in existing > > +structures, the kernel driver must be careful to zero out the padding > > +using memset() or similar before copying it to user space. > > This sentence is rather too long. Also it can be read as suggesting > that one should somehow identify and memset() the padding just before > copying to user-space. I suggest an alternate wording: > > For this reason (and for compat support) it is best to avoid any > implicit padding in data structures. Where there is implicit padding > in an existing structure, kernel drivers must be careful to fully > initialize an instance of the structure before copying it to user > space. This is usually done by calling memset() before assigning to > individual members. Sounds good, I've taken that paragraph now. > [...] > > +Alternatives to ioctl > > +===================== > [...] > > +* A custom file system can provide extra flexibility with a simple > > + user interface but add a lot of complexity to the implementation. > > Typo: "add" should be "adds". Fixed Thanks for all the good suggestions! Arnd