Purpose: These patches make the changes necessary to build the kernel as Position Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below the top 2G of the virtual address space. And this patchset provides an example to allow kernel image to be relocated in top 512G of the address space. The ultimate purpose for PIE kernel is to increase the security of the the kernel and also the fleixbility of the kernel image's virtual address, which can be even in the low half of the address space. More locations the kernel can fit in, this means an attacker could guess harder. The patchset is based on Thomas Garnier's X86 PIE patchset v6[1] and v11[2]. However, some design changes are made and some bugs are fixed by testing with different configurations and compilers. Important changes: - For fixmap area, move vsyscall page out of fixmap area and unify __FIXADDR_TOP for x86. Then fixmap area could be relocated together with kernel image. - For compile-time base address of kernel image, keep it in top 2G of address space. Introduce a new variable to store the run-time base address and adapt for VA/PA transition during runtime. - For percpu section, keep it as zero mapping for SMP. Because compile-time base address of kernel image still resides in top 2G of address space, then RIP-relative reference can still be used when percpu section is zero mapping. However, when do relocation for percpu variable references, percpu variable should be treated as normal variable and absolute references should be relocated accordingly. In addition, the relocation offset should be subtracted from the GS base in order to ensure correct operation. - For x86/boot/head64.c, don't build it as mcmodel=large. Instead, use data relocation to acqiure global symbol's value and make fixup_pointer() as a nop when running in identity mapping. This is because not all global symbol references in the code use fixup_pointer(), e.g. variables in macro related to 5-level paging, which can be optimized by GCC as relative referencs. If build it as mcmodel=large, there will be more fixup_pointer() calls, resulting in uglier code. Actually, if build it as PIE even when CONFIG_X86_PIE is disabled, then all fixup_pointer() could be dropped. However stack protector would be broken if per-cpu stack protector is not supported. Limitations: - Since I am not familiar with XEN, it has been disabled for now as it is not adapted for PIE. This is due to the assignment of wrong pointers (with low address values) to x86_ini_ops when running in identity mapping. This issue can be resolved by building pagetable eraly and jumping to high kernel address as soon as possible. - It is not allowed to reference global variables in an alternative section since RIP-relative addressing is not fixed in apply_alternatives(). Fortunately, all disallowed relocations in the alternative section can be captured by objtool. I believe that this issue can also be fixed by using objtool. - For module loading, only allow to load module without GOT for simplicity. Only weak global variable referencs are using GOT. Tests: I only have tested booting with GCC 5.1.0 (min version), GCC 12.2.0 and CLANG 15.0.7. And I have also run the following tests for both default configuration and Ubuntu configuration. Performance/Size impact (GCC 12.2.0): Size of vmlinux (Default configuration): File size: - PIE disabled: +0.012% - PIE enabled: -2.219% instructions: - PIE disabled: same - PIE enabled: +1.383% .text section: - PIE disabled: same - PIE enabled: +0.589% Size of vmlinux (Ubuntu configuration): File size: - PIE disabled: same - PIE enabled: +2.391% instructions: - PIE disabled: +0.013% - PIE enabled: +1.566% .text section: - PIE disabled: same - PIE enabled: +0.055% The .text section size increase is due to more instructions required for PIE code. There are two reasons that have been mentioned in previous mailist. Firstly, switch folding is disabled under PIE [3]. Secondly, two instructions are needed for PIE to represent a single instruction with sign extension, such as when accessing an array element. While only one instruction is required when using mcmode=kernel, for PIE, it needs to use lea to get the base of the array first. Hackbench (50% and 1600% on thread/process for pipe/sockets): - PIE disabled: no significant change (avg -/+ 0.5% on default config). - PIE enabled: -2% to +2% in average (default config). Kernbench (average of 10 Half and Optimal runs): Elapsed Time: - PIE disabled: no significant change (avg -0.2% on ubuntu config) - PIE enabled: average -0.2% to +0.2% System Time: - PIE disabled: no significant change (avg -0.5% on ubuntu config) - PIE enabled: average -0.5% to +0.5% [1] https://lore.kernel.org/all/20190131192533.34130-1-thgarnie@xxxxxxxxxxxx [2] https://lore.kernel.org/all/20200228000105.165012-1-thgarnie@xxxxxxxxxxxx [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303 Brian Gerst (1): x86-64: Use per-cpu stack canary if supported by compiler Hou Wenlong (29): x86/irq: Adapt assembly for PIE support x86,rethook: Adapt assembly for PIE support x86/paravirt: Use relative reference for original instruction x86/Kconfig: Introduce new Kconfig for PIE kernel building x86/PVH: Use fixed_percpu_data to set up GS base x86/pie: Enable stack protector only if per-cpu stack canary is supported x86/percpu: Use PC-relative addressing for percpu variable references x86/tools: Explicitly include autoconf.h for hostprogs x86/percpu: Adapt percpu references relocation for PIE support x86/ftrace: Adapt assembly for PIE support x86/pie: Force hidden visibility for all symbol references x86/boot/compressed: Adapt sed command to generate voffset.h when PIE is enabled x86/pie: Add .data.rel.* sections into link script KVM: x86: Adapt assembly for PIE support x86/PVH: Adapt PVH booting for PIE support x86/bpf: Adapt BPF_CALL JIT codegen for PIE support x86/modules: Adapt module loading for PIE support x86/boot/64: Use data relocation to get absloute address when PIE is enabled objtool: Add validation for x86 PIE support objtool: Adapt indirect call of __fentry__() for PIE support x86/pie: Build the kernel as PIE x86/vsyscall: Don't use set_fixmap() to map vsyscall page x86/xen: Pin up to VSYSCALL_ADDR when vsyscall page is out of fixmap area x86/fixmap: Move vsyscall page out of fixmap area x86/fixmap: Unify FIXADDR_TOP x86/boot: Fill kernel image puds dynamically x86/mm: Sort address_markers array when X86 PIE is enabled x86/pie: Allow kernel image to be relocated in top 512G x86/boot: Extend relocate range for PIE kernel image Thomas Garnier (13): x86/crypto: Adapt assembly for PIE support x86: Add macro to get symbol address for PIE support x86: relocate_kernel - Adapt assembly for PIE support x86/entry/64: Adapt assembly for PIE support x86: pm-trace: Adapt assembly for PIE support x86/CPU: Adapt assembly for PIE support x86/acpi: Adapt assembly for PIE support x86/boot/64: Adapt assembly for PIE support x86/power/64: Adapt assembly for PIE support x86/alternatives: Adapt assembly for PIE support x86/ftrace: Adapt ftrace nop patching for PIE support x86/mm: Make the x86 GOT read-only x86/relocs: Handle PIE relocations Documentation/x86/x86_64/mm.rst | 4 + arch/x86/Kconfig | 36 +++++- arch/x86/Makefile | 33 +++-- arch/x86/boot/compressed/Makefile | 2 +- arch/x86/boot/compressed/kaslr.c | 55 +++++++++ arch/x86/boot/compressed/misc.c | 4 +- arch/x86/boot/compressed/misc.h | 9 ++ arch/x86/crypto/aegis128-aesni-asm.S | 6 +- arch/x86/crypto/aesni-intel_asm.S | 2 +- arch/x86/crypto/aesni-intel_avx-x86_64.S | 3 +- arch/x86/crypto/aria-aesni-avx-asm_64.S | 30 ++--- arch/x86/crypto/camellia-aesni-avx-asm_64.S | 30 ++--- arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 30 ++--- arch/x86/crypto/camellia-x86_64-asm_64.S | 8 +- arch/x86/crypto/cast5-avx-x86_64-asm_64.S | 50 ++++---- arch/x86/crypto/cast6-avx-x86_64-asm_64.S | 44 ++++--- arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 3 +- arch/x86/crypto/des3_ede-asm_64.S | 96 ++++++++++----- arch/x86/crypto/ghash-clmulni-intel_asm.S | 4 +- arch/x86/crypto/sha256-avx2-asm.S | 18 ++- arch/x86/entry/calling.h | 17 ++- arch/x86/entry/entry_64.S | 22 +++- arch/x86/entry/vdso/Makefile | 2 +- arch/x86/entry/vsyscall/vsyscall_64.c | 7 +- arch/x86/include/asm/alternative.h | 6 +- arch/x86/include/asm/asm.h | 1 + arch/x86/include/asm/fixmap.h | 28 +---- arch/x86/include/asm/irq_stack.h | 2 +- arch/x86/include/asm/kmsan.h | 6 +- arch/x86/include/asm/nospec-branch.h | 10 +- arch/x86/include/asm/page_64.h | 8 +- arch/x86/include/asm/page_64_types.h | 8 ++ arch/x86/include/asm/paravirt.h | 17 ++- arch/x86/include/asm/paravirt_types.h | 12 +- arch/x86/include/asm/percpu.h | 29 ++++- arch/x86/include/asm/pgtable_64_types.h | 10 +- arch/x86/include/asm/pm-trace.h | 2 +- arch/x86/include/asm/processor.h | 17 ++- arch/x86/include/asm/sections.h | 5 + arch/x86/include/asm/stackprotector.h | 16 ++- arch/x86/include/asm/sync_core.h | 6 +- arch/x86/include/asm/vsyscall.h | 13 ++ arch/x86/kernel/acpi/wakeup_64.S | 31 ++--- arch/x86/kernel/alternative.c | 8 +- arch/x86/kernel/asm-offsets_64.c | 2 +- arch/x86/kernel/callthunks.c | 2 +- arch/x86/kernel/cpu/common.c | 15 ++- arch/x86/kernel/ftrace.c | 46 ++++++- arch/x86/kernel/ftrace_64.S | 9 +- arch/x86/kernel/head64.c | 77 +++++++++--- arch/x86/kernel/head_64.S | 68 ++++++++--- arch/x86/kernel/kvm.c | 21 +++- arch/x86/kernel/module.c | 27 +++++ arch/x86/kernel/paravirt.c | 4 + arch/x86/kernel/relocate_kernel_64.S | 2 +- arch/x86/kernel/rethook.c | 8 ++ arch/x86/kernel/setup.c | 6 + arch/x86/kernel/vmlinux.lds.S | 10 +- arch/x86/kvm/svm/vmenter.S | 10 +- arch/x86/kvm/vmx/vmenter.S | 2 +- arch/x86/lib/cmpxchg16b_emu.S | 8 +- arch/x86/mm/dump_pagetables.c | 36 +++++- arch/x86/mm/fault.c | 1 - arch/x86/mm/init_64.c | 10 +- arch/x86/mm/ioremap.c | 5 +- arch/x86/mm/kasan_init_64.c | 4 +- arch/x86/mm/pat/set_memory.c | 2 +- arch/x86/mm/pgtable.c | 13 ++ arch/x86/mm/pgtable_32.c | 3 - arch/x86/mm/physaddr.c | 14 +-- arch/x86/net/bpf_jit_comp.c | 17 ++- arch/x86/platform/efi/efi_thunk_64.S | 4 + arch/x86/platform/pvh/head.S | 29 ++++- arch/x86/power/hibernate_asm_64.S | 4 +- arch/x86/tools/Makefile | 4 +- arch/x86/tools/relocs.c | 113 ++++++++++++++++- arch/x86/xen/mmu_pv.c | 32 +++-- arch/x86/xen/xen-asm.S | 10 +- arch/x86/xen/xen-head.S | 14 ++- include/asm-generic/vmlinux.lds.h | 12 ++ scripts/Makefile.lib | 1 + scripts/recordmcount.c | 81 ++++++++----- tools/objtool/arch/x86/decode.c | 10 +- tools/objtool/builtin-check.c | 4 +- tools/objtool/check.c | 121 +++++++++++++++++++ tools/objtool/include/objtool/builtin.h | 1 + 86 files changed, 1202 insertions(+), 410 deletions(-) Patchset is based on tip/master. base-commit: 01cbd032298654fe4c85e153dd9a224e5bc10194 -- 2.31.1