On 14-12-22 11:05 pm, David Laight wrote:
From: Greg Kroah-Hartman
Sent: 12 December 2022 13:35
On Mon, Dec 12, 2022 at 06:54:24PM +0530, Prashanth K wrote:
Function pointer ki_complete() expects 'long' as its second
argument, but we pass integer from ffs_user_copy_worker. This
might cause a CFI failure, as ki_complete is an indirect call
with mismatched prototype. Fix this by typecasting the second
argument to long.
"might"? Does it or not? If it does, why hasn't this been reported
before?
Does the cast even help at all.
Actually I also have these same questions
- why we haven't seen any instances other than this one?
- why its not seen on other indirect function calls?
Here is the the call stack of the failure that we got.
[ 323.288681][ T7] Kernel panic - not syncing: CFI failure (target:
0xffffffe5fc811f98)
[ 323.288710][ T7] CPU: 6 PID: 7 Comm: kworker/u16:0 Tainted: G S
W OE 5.15.41-android13-8-g5ffc5644bd20 #1
[ 323.288730][ T7] Workqueue: adb ffs_user_copy_worker.cfi_jt
[ 323.288752][ T7] Call trace:
[ 323.288755][ T7] dump_backtrace.cfi_jt+0x0/0x8
[ 323.288772][ T7] dump_stack_lvl+0x80/0xb8
[ 323.288785][ T7] panic+0x180/0x444
[ 323.288797][ T7] find_check_fn+0x0/0x218
[ 323.288810][ T7] ffs_user_copy_worker+0x1dc/0x204
[ 323.288822][ T7] kretprobe_trampoline.cfi_jt+0x0/0x8
[ 323.288837][ T7] worker_thread+0x3ec/0x920
[ 323.288850][ T7] kthread+0x168/0x1dc
[ 323.288859][ T7] ret_from_fork+0x10/0x20
[ 323.288866][ T7] SMP: stopping secondary CPUs
And from address to line translation, we got know the issue is from
ffs_user_copy_worker+0x1dc/0x204
||
io_data->kiocb->ki_complete(io_data->kiocb, ret);
And "find_check_fn" was getting invoked from ki_complete. Only thing
that I found suspicious about ki_complete() is its argument types.
That's why I pushed this patch here, so that we can discuss this out here.
Thanks in advance
...
- io_data->kiocb->ki_complete(io_data->kiocb, ret);
+ io_data->kiocb->ki_complete(io_data->kiocb, (long)ret);
...
If definition of the parameter in the structure member ki_complete()
definition is 'long' then the compiler has to promote 'ret' to long
anyway. CFI has nothing to do with it.
OTOH if you've used a cast to assign a function with a
different prototype to ki_complete then 'all bets are off'
and you get all the run time errors you deserve.
CFI just converts some of them to compile time errors.
For instance if you assign xx_complete(long) to (*ki_complete)(int)
then it is very likely that xx_complete() will an argument
with some of the high bits set.
But adding a cast to the call - ki_complete((long)int_var)
will make absolutely no difference.
The compiler wont zero/sign extend int_var to 64bits for you,
that will just get optimised away and the high bits will
be unchanged.
You're description seems to be the other way around (which might
be safe, but CFI probably still barfs).
But you need to fix the indirect calls so the function types
match.
So does that mean, we need to add casts in al indirect calls to match
the function signature?
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)