Hi, On Sun, Nov 19, 2023 at 06:14:50AM -0700, Jens Axboe wrote: > On 11/18/23 4:45 PM, Timothy Pearson wrote: > > During floating point and vector save to thread data fr0/vs0 are clobbered > > by the FPSCR/VSCR store routine. This leads to userspace register corruption > > and application data corruption / crash under the following rare condition: > > > > * A userspace thread is executing with VSX/FP mode enabled > > * The userspace thread is making active use of fr0 and/or vs0 > > * An IPI is taken in kernel mode, forcing the userspace thread to reschedule > > * The userspace thread is interrupted by the IPI before accessing data it > > previously stored in fr0/vs0 > > * The thread being switched in by the IPI has a pending signal > > > > If these exact criteria are met, then the following sequence happens: > > > > * The existing thread FP storage is still valid before the IPI, due to a > > prior call to save_fpu() or store_fp_state(). Note that the current > > fr0/vs0 registers have been clobbered, so the FP/VSX state in registers > > is now invalid pending a call to restore_fp()/restore_altivec(). > > * IPI -- FP/VSX register state remains invalid > > * interrupt_exit_user_prepare_main() calls do_notify_resume(), > > due to the pending signal > > * do_notify_resume() eventually calls save_fpu() via giveup_fpu(), which > > merrily reads and saves the invalid FP/VSX state to thread local storage. > > * interrupt_exit_user_prepare_main() calls restore_math(), writing the invalid > > FP/VSX state back to registers. > > * Execution is released to userspace, and the application crashes or corrupts > > data. > > What an epic bug hunt! Hats off to you for seeing it through and getting > to the bottom of it. Particularly difficult as the commit that made it > easier to trigger was in no way related to where the actual bug was. > > I ran this on the vm I have access to, and it survived 2x500 iterations. > Happy to call that good: > > Tested-by: Jens Axboe <axboe@xxxxxxxxx> Thanks to all involved! Is this going to land soon in mainline so it can be picked as well for the affected stable trees? Regards, Salvatore