On 5/30/21 3:02 PM, Thomas Gleixner wrote: > Andy, > > On Sat, May 29 2021 at 22:12, Andy Lutomirski wrote: >> >> Cc: stable@xxxxxxxxxxxxxxx >> Fixes: b860eb8dce59 ("x86/fpu/xstate: Define new functions for clearing fpregs and xstates") >> Reported-by: syzbot+2067e764dbcd10721e2e@xxxxxxxxxxxxxxxxxxxxxxxxx > > Debugged-by ... > >> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx> > > ... > >> /* >> - * Clear the FPU state back to init state. >> - * >> - * Called by sys_execve(), by the signal handler code and by various >> - * error paths. >> + * Reset current's user FPU states to the init states. The caller promises >> + * that current's supervisor states (in memory or CPU regs as appropriate) >> + * as well as the XSAVE header in memory are intact. ^^^ The caller promises this >> */ >> -static void fpu__clear(struct fpu *fpu, bool user_only) >> +void fpu__clear_user_states(struct fpu *fpu) >> { >> WARN_ON_FPU(fpu != ¤t->thread.fpu); >> ... >> + /* >> + * Reset user states in registers. >> + */ >> + copy_init_fpstate_to_fpregs(xfeatures_mask_user()); >> + >> + /* >> + * Now all FPU registers have their desired values. Inform the >> + * FPU state machine that current's FPU registers are in the >> + * hardware registers. >> + */ >> fpregs_mark_activate(); >> + >> fpregs_unlock(); > > This is as wrong as before. The corrupted task->fpu.state still > survives. See above. This function is not intended to fix the header. > > For f*cks sake, I gave you a reproducer and a working patch and I > explained it in great length what's broken and what needs to be fixed. This patch fixes your reproducer and my (to-be-sent) reproducer. I tested it on a machine that physically has the XRSTORS instruction but I disabled it using virt. Are you still seeing failures with this patch applied? I can try to test on a different CPU. > > And of course you kept the bug which was in the offending commit, > i.e. not wiping the task->fpu.state corruption which causes the next > XRSTOR to fail: > > [ 34.095020] Bad FPU state detected at copy_kernel_to_fpregs+0x28/0x40, reinitializing FPU registers. > [ 34.095052] WARNING: CPU: 0 PID: 1364 at arch/x86/mm/extable.c:65 ex_handler_fprestore+0x5f/0x70 > ... > [ 34.153472] switch_fpu_return+0x40/0xb0 > [ 34.154196] exit_to_user_mode_prepare+0x8f/0x180 > [ 34.155060] syscall_exit_to_user_mode+0x23/0x50 > [ 34.155912] do_syscall_64+0x4d/0xb0 I don't think that the code path that calls fpu__clear_user_states() is subject to header corruption. If it is, then both your patch and my patch have issues -- trying to fish the supervisor state out of a corrupt XSTATE buffer is asking for trouble. > > IOW, this is exactly the same shit as we had before. So what is decent > about this? Define decent... > > Why the heck do you think I wasted a couple of days to: > > - Analyze the root cause > > - Destill a trivial C reproducer > > - Come up with a fully working and completely correct fix > > Just because, right? > > I'm fine with splitting up clear_all() and clear_user(), but what you > provided is as much of a clusterfuck as the commit it pretends to fix. > > Your's seriously grumpy > > Thomas >