From: Dan Carpenter <error27@xxxxxxxxx> Date: Fri, 23 Dec 2022 12:04:18 +0300 > On Thu, Dec 22, 2022 at 06:21:03PM +0530, Prashanth K wrote: > > > > > > On 14-12-22 11:05 pm, David Laight wrote: > > > From: Greg Kroah-Hartman > > > > Sent: 12 December 2022 13:35 > > > > > > > > On Mon, Dec 12, 2022 at 06:54:24PM +0530, Prashanth K wrote: > > > > > Function pointer ki_complete() expects 'long' as its second > > > > > argument, but we pass integer from ffs_user_copy_worker. This > > > > > might cause a CFI failure, as ki_complete is an indirect call > > > > > with mismatched prototype. Fix this by typecasting the second > > > > > argument to long. > > > > > > > > "might"? Does it or not? If it does, why hasn't this been reported > > > > before? > > > > > > Does the cast even help at all. > > Actually I also have these same questions > > - why we haven't seen any instances other than this one? > > - why its not seen on other indirect function calls? > > > > Here is the the call stack of the failure that we got. > > > > [ 323.288681][ T7] Kernel panic - not syncing: CFI failure (target: > > 0xffffffe5fc811f98) > > [ 323.288710][ T7] CPU: 6 PID: 7 Comm: kworker/u16:0 Tainted: G S W > > OE 5.15.41-android13-8-g5ffc5644bd20 #1 > > [ 323.288730][ T7] Workqueue: adb ffs_user_copy_worker.cfi_jt > > [ 323.288752][ T7] Call trace: > > [ 323.288755][ T7] dump_backtrace.cfi_jt+0x0/0x8 > > [ 323.288772][ T7] dump_stack_lvl+0x80/0xb8 > > [ 323.288785][ T7] panic+0x180/0x444 > > [ 323.288797][ T7] find_check_fn+0x0/0x218 > > [ 323.288810][ T7] ffs_user_copy_worker+0x1dc/0x204 > > [ 323.288822][ T7] kretprobe_trampoline.cfi_jt+0x0/0x8 > > [ 323.288837][ T7] worker_thread+0x3ec/0x920 > > [ 323.288850][ T7] kthread+0x168/0x1dc > > [ 323.288859][ T7] ret_from_fork+0x10/0x20 > > [ 323.288866][ T7] SMP: stopping secondary CPUs > > > > And from address to line translation, we got know the issue is from > > ffs_user_copy_worker+0x1dc/0x204 > > || > > io_data->kiocb->ki_complete(io_data->kiocb, ret); > > > > And "find_check_fn" was getting invoked from ki_complete. Only thing that I > > found suspicious about ki_complete() is its argument types. That's why I > > pushed this patch here, so that we can discuss this out here. > > I think the problem is more likely whatever ->ki_complete() points to > but I have no idea what that is on your system. You're using an Android > kernel so it could be something out of tree as well... Correct, CFI would *never* trigger a failure due to passing int as long. It triggers only on prototype-implementation mismatches. The author should go and check carefully whether there are any places where some implementation differs and then ::ki_complete() gets passed with a function typecast. Also, there can be places where a proto has an argument as enum and an implementation has it as int. Compilers don't warn on such mismatches, CFI does. The latest LLVM Git snapshot with `-Wcast-function-type-strict` enabled could help hunt such. > > regards, > dan carpenter Thanks, Olek