The patch titled Subject: arch: add ARCH_HAS_KERNEL_FPU_SUPPORT has been added to the -mm mm-nonmm-unstable branch. Its filename is arch-add-arch_has_kernel_fpu_support.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/arch-add-arch_has_kernel_fpu_support.patch This patch will later appear in the mm-nonmm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Samuel Holland <samuel.holland@xxxxxxxxxx> Subject: arch: add ARCH_HAS_KERNEL_FPU_SUPPORT Date: Wed, 27 Mar 2024 13:00:32 -0700 Patch series "Unified cross-architecture kernel-mode FPU API", v3. This series unifies the kernel-mode FPU API across several architectures by wrapping the existing functions (where needed) in consistently-named functions placed in a consistent header location, with mostly the same semantics: they can be called from preemptible or non-preemptible task context, and are not assumed to be reentrant. Architectures are also expected to provide CFLAGS adjustments for compiling FPU-dependent code. For the moment, SIMD/vector units are out of scope for this common API. This allows us to remove the ifdeffery and duplicated Makefile logic at each FPU user. It then implements the common API on RISC-V, and converts a couple of users to the new API: the AMDGPU DRM driver, and the FPU self test. The underlying goal of this series is to allow using newer AMD GPUs (e.g. Navi) on RISC-V boards such as SiFive's HiFive Unmatched. Those GPUs need CONFIG_DRM_AMD_DC_FP to initialize, which requires kernel-mode FPU support. This patch (of 14): Several architectures provide an API to enable the FPU and run floating-point SIMD code in kernel space. However, the function names, header locations, and semantics are inconsistent across architectures, and FPU support may be gated behind other Kconfig options. Provide a standard way for architectures to declare that kernel space FPU support is available. Architectures selecting this option must implement what is currently the most common API (kernel_fpu_begin() and kernel_fpu_end(), plus a new function kernel_fpu_available()) and provide the appropriate CFLAGS for compiling floating-point C code. Link: https://lkml.kernel.org/r/20240327200157.1097089-1-samuel.holland@xxxxxxxxxx Link: https://lkml.kernel.org/r/20240327200157.1097089-2-samuel.holland@xxxxxxxxxx Signed-off-by: Samuel Holland <samuel.holland@xxxxxxxxxx> Suggested-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> Cc: Borislav Petkov (AMD) <bp@xxxxxxxxx> Cc: Catalin Marinas <catalin.marinas@xxxxxxx> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> Cc: Huacai Chen <chenhuacai@xxxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxxxxx> Cc: Jonathan Corbet <corbet@xxxxxxx> Cc: Masahiro Yamada <masahiroy@xxxxxxxxxx> Cc: Nathan Chancellor <nathan@xxxxxxxxxx> Cc: Nicolas Schier <nicolas@xxxxxxxxx> Cc: Russell King <linux@xxxxxxxxxxxxxxx> Cc: Samuel Holland <samuel.holland@xxxxxxxxxx> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: Will Deacon <will@xxxxxxxxxx> Cc: Alex Deucher <alexander.deucher@xxxxxxx> Cc: Michael Ellerman <mpe@xxxxxxxxxxxxxx> Cc: Palmer Dabbelt <palmer@xxxxxxxxxxxx> Cc: WANG Xuerui <git@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- Documentation/core-api/floating-point.rst | 78 ++++++++++++++++++++ Documentation/core-api/index.rst | 1 Makefile | 5 + arch/Kconfig | 6 + include/linux/fpu.h | 12 +++ 5 files changed, 102 insertions(+) --- a/arch/Kconfig~arch-add-arch_has_kernel_fpu_support +++ a/arch/Kconfig @@ -1569,6 +1569,12 @@ config ARCH_HAS_NONLEAF_PMD_YOUNG address translations. Page table walkers that clear the accessed bit may use this capability to reduce their search space. +config ARCH_HAS_KERNEL_FPU_SUPPORT + bool + help + Architectures that select this option can run floating-point code in + the kernel, as described in Documentation/core-api/floating-point.rst. + source "kernel/gcov/Kconfig" source "scripts/gcc-plugins/Kconfig" --- /dev/null +++ a/Documentation/core-api/floating-point.rst @@ -0,0 +1,78 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Floating-point API +================== + +Kernel code is normally prohibited from using floating-point (FP) registers or +instructions, including the C float and double data types. This rule reduces +system call overhead, because the kernel does not need to save and restore the +userspace floating-point register state. + +However, occasionally drivers or library functions may need to include FP code. +This is supported by isolating the functions containing FP code to a separate +translation unit (a separate source file), and saving/restoring the FP register +state around calls to those functions. This creates "critical sections" of +floating-point usage. + +The reason for this isolation is to prevent the compiler from generating code +touching the FP registers outside these critical sections. Compilers sometimes +use FP registers to optimize inlined ``memcpy`` or variable assignment, as +floating-point registers may be wider than general-purpose registers. + +Usability of floating-point code within the kernel is architecture-specific. +Additionally, because a single kernel may be configured to support platforms +both with and without a floating-point unit, FPU availability must be checked +both at build time and at run time. + +Several architectures implement the generic kernel floating-point API from +``linux/fpu.h``, as described below. Some other architectures implement their +own unique APIs, which are documented separately. + +Build-time API +-------------- + +Floating-point code may be built if the option ``ARCH_HAS_KERNEL_FPU_SUPPORT`` +is enabled. For C code, such code must be placed in a separate file, and that +file must have its compilation flags adjusted using the following pattern:: + + CFLAGS_foo.o += $(CC_FLAGS_FPU) + CFLAGS_REMOVE_foo.o += $(CC_FLAGS_NO_FPU) + +Architectures are expected to define one or both of these variables in their +top-level Makefile as needed. For example:: + + CC_FLAGS_FPU := -mhard-float + +or:: + + CC_FLAGS_NO_FPU := -msoft-float + +Normal kernel code is assumed to use the equivalent of ``CC_FLAGS_NO_FPU``. + +Runtime API +----------- + +The runtime API is provided in ``linux/fpu.h``. This header cannot be included +from files implementing FP code (those with their compilation flags adjusted as +above). Instead, it must be included when defining the FP critical sections. + +.. c:function:: bool kernel_fpu_available( void ) + + This function reports if floating-point code can be used on this CPU or + platform. The value returned by this function is not expected to change + at runtime, so it only needs to be called once, not before every + critical section. + +.. c:function:: void kernel_fpu_begin( void ) + void kernel_fpu_end( void ) + + These functions create a floating-point critical section. It is only + valid to call ``kernel_fpu_begin()`` after a previous call to + ``kernel_fpu_available()`` returned ``true``. These functions are only + guaranteed to be callable from (preemptible or non-preemptible) process + context. + + Preemption may be disabled inside critical sections, so their size + should be minimized. They are *not* required to be reentrant. If the + caller expects to nest critical sections, it must implement its own + reference counting. --- a/Documentation/core-api/index.rst~arch-add-arch_has_kernel_fpu_support +++ a/Documentation/core-api/index.rst @@ -48,6 +48,7 @@ Library functionality that is used throu errseq wrappers/atomic_t wrappers/atomic_bitops + floating-point Low level entry and exit ======================== --- /dev/null +++ a/include/linux/fpu.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _LINUX_FPU_H +#define _LINUX_FPU_H + +#ifdef _LINUX_FPU_COMPILATION_UNIT +#error FP code must be compiled separately. See Documentation/core-api/floating-point.rst. +#endif + +#include <asm/fpu.h> + +#endif --- a/Makefile~arch-add-arch_has_kernel_fpu_support +++ a/Makefile @@ -964,6 +964,11 @@ KBUILD_CFLAGS += $(CC_FLAGS_CFI) export CC_FLAGS_CFI endif +# Architectures can define flags to add/remove for floating-point support +CC_FLAGS_FPU += -D_LINUX_FPU_COMPILATION_UNIT +export CC_FLAGS_FPU +export CC_FLAGS_NO_FPU + ifneq ($(CONFIG_FUNCTION_ALIGNMENT),0) # Set the minimal function alignment. Use the newer GCC option # -fmin-function-alignment if it is available, or fall back to -falign-funtions. _ Patches currently in -mm which might be from samuel.holland@xxxxxxxxxx are arch-add-arch_has_kernel_fpu_support.patch arm-implement-arch_has_kernel_fpu_support.patch arm-crypto-use-cc_flags_fpu-for-neon-cflags.patch arm64-implement-arch_has_kernel_fpu_support.patch arm64-crypto-use-cc_flags_fpu-for-neon-cflags.patch lib-raid6-use-cc_flags_fpu-for-neon-cflags.patch loongarch-implement-arch_has_kernel_fpu_support.patch powerpc-implement-arch_has_kernel_fpu_support.patch x86-implement-arch_has_kernel_fpu_support.patch riscv-add-support-for-kernel-mode-fpu.patch drm-amd-display-use-arch_has_kernel_fpu_support.patch selftests-fpu-move-fp-code-to-a-separate-translation-unit.patch selftests-fpu-allow-building-on-other-architectures.patch