RE: linux interprets an fcntl int arg as long

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: 'Szabolcs Nagy' <szabolcs.nagy@xxxxxxx>
> Sent: 01 November 2022 11:45
> 
> The 11/01/2022 10:02, David Laight wrote:
> > From: Szabolcs Nagy
> > > Sent: 01 November 2022 09:11
> > >
> > > The 10/31/2022 21:46, Theodore Ts'o wrote:
> > > > On Mon, Oct 31, 2022 at 12:44:59PM +0000, Szabolcs Nagy wrote:
> > > > > and such fcntl call can happen with c code that just passes
> > > > > F_SEAL_WRITE since it is an int and e.g. with aarch64 pcs rules
> > > > > it is passed in a register where top bits can be non-zero
> > > > > (unlikely in practice but valid).
> > > >
> > > > In Linux's aarch64 ABI, an int is a 4-byte value.  It is *not* an
> > > > 8-byte value.  So passing in "F_SEAL_WRITE | 0xF00000000" as an int
> > > > (as in your example) is simply not valid thing for the userspace
> > > > program to do.
> > > >
> > > > Now, if there is a C program which has "int c = F_SEAL_WRITE", if the
> > > > PCS allows the compiler to pass a function paramter c --- for example
> > > > f(a, b, c) --- where the 4-byte paramter 'c' is placed in a 64-bit
> > > > register where the high bits of the 64-bit register contains non-zero
> > > > garbage values, I would argue that this is a bug in the PCS and/or the
> > > > compiler.
> > >
> > > the callee uses va_arg(ap, type) to get the argument,
> > > and if the type is wider than what was actually passed
> > > then anything can happen. in practice what happens is
> > > that the top bits can be non-zero.
> > >
> > > many pcs are affected (aarch64 is the one i know well,
> > > but at least x86_64, arm are affected too). and even if
> > > it was aarch64 pcs only, it is incompetent to say that
> > > the pcs is wrong: that's a constraint we are working with.
> > >
> > > the kernel must not read a wider type than what it
> > > documents as argument to variadic functions in the c api.
> > > (it does not make much sense to expect anything there
> > > anyway, but it can break userspace)
> >
> > The Linux kernel just assumes that the varargs call looks like
> > a non-varags call with the same parameters.
> > (It doesn't use va_arg())
> > All syscall arguments are passed in registers (unlike BSDs
> > where they can also be on the user stack).
> > On 64bit systems the same registers are expected to be used
> > for 64bit and 32bit integers and for pointers.
> > 32bit values usually get masked because they get passed to
> > a function with an 'int' argument.
> >
> > If any fcntl() calls require a 64bit value and the C ABI
> > might leave non-zero high bits in an register containing
> > a 32bit value (esp. to a varargs function) then the calling
> > code will need to cast such arguments to 64 bits.
> 
> the entire point of my mail is that it is not possible
> to tell in the libc if the vararg is pointer or int.
> 
> so in case a user passed an int, the libc cannot fix
> that up, like it usually does for other cases where
> linux syscall abi is incompatible with the c api.
> 
> let me go through step by step what is going on:
...
> long internal_syscall(int, long, long, long, long, long, long);
> int fcntl(int fd, int cmd, ...)
> {
> 	va_list ap;
> 	va_start(ap, cmd);
> 	/* this is non-conforming C: wrong type in va_arg,
> 	   but that's not relevant since libc can implement
> 	   this as target specific asm, the important bit is
> 	   that the correct type is not known: libc cannot
> 	   replicate the kernel side dispatch logic because
> 	   new cmd can be introduced in the future with
> 	   arbitrary type arg.
> 
> 	   top 32bits of arg are non-zero, libc cannot
> 	   zero them here as arg may be long or pointer.  */
> 	long arg = va_arg(ap, long);

Here libc has to assume that int, long and pointer are passed
the same way.
This is true for everything Linux actually runs on.
But wouldn't be if, for example, a 64bit arch just pushed
32bit arguments on stack.
Or if a 32 bit one passed integer and pointer args in different
types of registers.
But all 64bit ones use the same GP registers for int/long/pointer.

> kernel code:
> ------------
> SYSCALL_DEFINE3(fcntl, unsigned int, fd, unsigned int, cmd, unsigned long, arg)
> {

That is just a wrapper and calls do_fcntl().
which needs changing to be add:
	arg &= ~0U;
before the switch(cmd) {

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux