On Tue, Jan 23, 2024 at 03:28:04AM +0800, Xi Ruoyao wrote: > There has been a lingering bug in LoongArch Linux systems causing some > GCC tests to intermittently fail (see Closes link). I've made a minimal > reproducer: > > zsh% cat measure.s > .align 4 > .globl _start > _start: > movfcsr2gr $a0, $fcsr0 > bstrpick.w $a0, $a0, 16, 16 > beqz $a0, .ok > break 0 > .ok: > li.w $a7, 93 > syscall 0 > zsh% cc mesaure.s -o measure -nostdlib > zsh% echo $((1.0/3)) > 0.33333333333333331 > zsh% while ./measure; do ; done > > This while loop should not stop as POSIX is clear that execve must set > fenv to the default, where FCSR should be zero. But in fact it will > just stop after running for a while (normally less than 30 seconds). > Note that "$((1.0/3))" is needed to reproduce this issue because it > raises FE_INVALID and makes fcsr0 non-zero. > > The problem is we are currently relying on SET_PERSONALITY2() to reset > current->thread.fpu.fcsr. But SET_PERSONALITY2() is executed before > start_thread which calls lose_fpu(0). We can see if kernel preempt is > enabled, we may switch to another thread after SET_PERSONALITY2() but > before lose_fpu(0). Then bad thing happens: during the thread switch > the value of the fcsr0 register is stored into current->thread.fpu.fcsr, > making it dirty again. > > The issue can be fixed by setting current->thread.fpu.fcsr after > lose_fpu(0) because lose_fpu() clears TIF_USEDFPU, then the thread > switch won't touch current->thread.fpu.fcsr. > > The only other architecture setting FCSR in SET_PERSONALITY2() is MIPS. > I've ran a similar test on MIPS with mainline kernel and it turns out > MIPS is buggy, too. Anyway MIPS do this for supporting different FP > flavors (NaN encodings, etc.) which do not exist on LoongArch. So for > LoongArch, we can simply remove the current->thread.fpu.fcsr setting > from SET_PERSONALITY2() and do it in start_thread(), after lose_fpu(0). > > The while loop failing with the mainline kernel has survived one hour > after this change on LoongArch. > > Fixes: 803b0fc5c3f2baa ("LoongArch: Add process management") > Closes: https://github.com/loongson-community/discussions/issues/7 > Link: https://lore.kernel.org/linux-mips/7a6aa1bbdbbe2e63ae96ff163fab0349f58f1b9e.camel@xxxxxxxxxxx/ > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Xi Ruoyao <xry111@xxxxxxxxxxx> > Signed-off-by: Huacai Chen <chenhuacai@xxxxxxxxxxx> > (cherry picked from commit c2396651309eba291c15e32db8fbe44c738b5921) > Signed-off-by: Xi Ruoyao <xry111@xxxxxxxxxxx> > --- > > The conflict is because 6.1.y does not have LBT support, thus there is > no lose_lbt() line. Resolved manually. Now queued up, thanks. greg k-h