Re: [PATCH 6.1.y] LoongArch: Fix and simplify fcsr initialization on execve()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 23, 2024 at 03:28:04AM +0800, Xi Ruoyao wrote:
> There has been a lingering bug in LoongArch Linux systems causing some
> GCC tests to intermittently fail (see Closes link).  I've made a minimal
> reproducer:
> 
>     zsh% cat measure.s
>     .align 4
>     .globl _start
>     _start:
>         movfcsr2gr  $a0, $fcsr0
>         bstrpick.w  $a0, $a0, 16, 16
>         beqz        $a0, .ok
>         break       0
>     .ok:
>         li.w        $a7, 93
>         syscall     0
>     zsh% cc mesaure.s -o measure -nostdlib
>     zsh% echo $((1.0/3))
>     0.33333333333333331
>     zsh% while ./measure; do ; done
> 
> This while loop should not stop as POSIX is clear that execve must set
> fenv to the default, where FCSR should be zero.  But in fact it will
> just stop after running for a while (normally less than 30 seconds).
> Note that "$((1.0/3))" is needed to reproduce this issue because it
> raises FE_INVALID and makes fcsr0 non-zero.
> 
> The problem is we are currently relying on SET_PERSONALITY2() to reset
> current->thread.fpu.fcsr.  But SET_PERSONALITY2() is executed before
> start_thread which calls lose_fpu(0).  We can see if kernel preempt is
> enabled, we may switch to another thread after SET_PERSONALITY2() but
> before lose_fpu(0).  Then bad thing happens: during the thread switch
> the value of the fcsr0 register is stored into current->thread.fpu.fcsr,
> making it dirty again.
> 
> The issue can be fixed by setting current->thread.fpu.fcsr after
> lose_fpu(0) because lose_fpu() clears TIF_USEDFPU, then the thread
> switch won't touch current->thread.fpu.fcsr.
> 
> The only other architecture setting FCSR in SET_PERSONALITY2() is MIPS.
> I've ran a similar test on MIPS with mainline kernel and it turns out
> MIPS is buggy, too.  Anyway MIPS do this for supporting different FP
> flavors (NaN encodings, etc.) which do not exist on LoongArch.  So for
> LoongArch, we can simply remove the current->thread.fpu.fcsr setting
> from SET_PERSONALITY2() and do it in start_thread(), after lose_fpu(0).
> 
> The while loop failing with the mainline kernel has survived one hour
> after this change on LoongArch.
> 
> Fixes: 803b0fc5c3f2baa ("LoongArch: Add process management")
> Closes: https://github.com/loongson-community/discussions/issues/7
> Link: https://lore.kernel.org/linux-mips/7a6aa1bbdbbe2e63ae96ff163fab0349f58f1b9e.camel@xxxxxxxxxxx/
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Xi Ruoyao <xry111@xxxxxxxxxxx>
> Signed-off-by: Huacai Chen <chenhuacai@xxxxxxxxxxx>
> (cherry picked from commit c2396651309eba291c15e32db8fbe44c738b5921)
> Signed-off-by: Xi Ruoyao <xry111@xxxxxxxxxxx>
> ---
> 
> The conflict is because 6.1.y does not have LBT support, thus there is
> no lose_lbt() line.  Resolved manually.

Now queued up, thanks.

greg k-h




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux