From: 'Szabolcs Nagy' <szabolcs.nagy@xxxxxxx> > Sent: 01 November 2022 11:45 > > The 11/01/2022 10:02, David Laight wrote: > > From: Szabolcs Nagy > > > Sent: 01 November 2022 09:11 > > > > > > The 10/31/2022 21:46, Theodore Ts'o wrote: > > > > On Mon, Oct 31, 2022 at 12:44:59PM +0000, Szabolcs Nagy wrote: > > > > > and such fcntl call can happen with c code that just passes > > > > > F_SEAL_WRITE since it is an int and e.g. with aarch64 pcs rules > > > > > it is passed in a register where top bits can be non-zero > > > > > (unlikely in practice but valid). > > > > > > > > In Linux's aarch64 ABI, an int is a 4-byte value. It is *not* an > > > > 8-byte value. So passing in "F_SEAL_WRITE | 0xF00000000" as an int > > > > (as in your example) is simply not valid thing for the userspace > > > > program to do. > > > > > > > > Now, if there is a C program which has "int c = F_SEAL_WRITE", if the > > > > PCS allows the compiler to pass a function paramter c --- for example > > > > f(a, b, c) --- where the 4-byte paramter 'c' is placed in a 64-bit > > > > register where the high bits of the 64-bit register contains non-zero > > > > garbage values, I would argue that this is a bug in the PCS and/or the > > > > compiler. > > > > > > the callee uses va_arg(ap, type) to get the argument, > > > and if the type is wider than what was actually passed > > > then anything can happen. in practice what happens is > > > that the top bits can be non-zero. > > > > > > many pcs are affected (aarch64 is the one i know well, > > > but at least x86_64, arm are affected too). and even if > > > it was aarch64 pcs only, it is incompetent to say that > > > the pcs is wrong: that's a constraint we are working with. > > > > > > the kernel must not read a wider type than what it > > > documents as argument to variadic functions in the c api. > > > (it does not make much sense to expect anything there > > > anyway, but it can break userspace) > > > > The Linux kernel just assumes that the varargs call looks like > > a non-varags call with the same parameters. > > (It doesn't use va_arg()) > > All syscall arguments are passed in registers (unlike BSDs > > where they can also be on the user stack). > > On 64bit systems the same registers are expected to be used > > for 64bit and 32bit integers and for pointers. > > 32bit values usually get masked because they get passed to > > a function with an 'int' argument. > > > > If any fcntl() calls require a 64bit value and the C ABI > > might leave non-zero high bits in an register containing > > a 32bit value (esp. to a varargs function) then the calling > > code will need to cast such arguments to 64 bits. > > the entire point of my mail is that it is not possible > to tell in the libc if the vararg is pointer or int. > > so in case a user passed an int, the libc cannot fix > that up, like it usually does for other cases where > linux syscall abi is incompatible with the c api. > > let me go through step by step what is going on: ... > long internal_syscall(int, long, long, long, long, long, long); > int fcntl(int fd, int cmd, ...) > { > va_list ap; > va_start(ap, cmd); > /* this is non-conforming C: wrong type in va_arg, > but that's not relevant since libc can implement > this as target specific asm, the important bit is > that the correct type is not known: libc cannot > replicate the kernel side dispatch logic because > new cmd can be introduced in the future with > arbitrary type arg. > > top 32bits of arg are non-zero, libc cannot > zero them here as arg may be long or pointer. */ > long arg = va_arg(ap, long); Here libc has to assume that int, long and pointer are passed the same way. This is true for everything Linux actually runs on. But wouldn't be if, for example, a 64bit arch just pushed 32bit arguments on stack. Or if a 32 bit one passed integer and pointer args in different types of registers. But all 64bit ones use the same GP registers for int/long/pointer. > kernel code: > ------------ > SYSCALL_DEFINE3(fcntl, unsigned int, fd, unsigned int, cmd, unsigned long, arg) > { That is just a wrapper and calls do_fcntl(). which needs changing to be add: arg &= ~0U; before the switch(cmd) { David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)