[PATCH v10 00/15] Function Granular KASLR

Alexander Lobakin <alexandr.lobakin@xxxxxxxxx> · Wed, 9 Feb 2022 19:57:37 +0100

From: Kristen Carlson Accardi <kristen@xxxxxxxxxxxxxxx>

Function Granular Kernel Address Space Layout Randomization (FG-KASLR)
---------------------------------------------------------------------

This is an implementation of finer grained kernel address space
randomization. It rearranges the kernel code at load time on a
per-function level granularity, with only around a second added
to boot time.

Background
----------
KASLR was merged into the kernel with the objective of increasing the
difficulty of code reuse attacks. Code reuse attacks reused existing code
snippets to get around existing memory protections. They exploit software
bugs which expose addresses of useful code snippets to control the flow of
execution for their own nefarious purposes. KASLR moves the entire kernel
code text as a unit at boot time in order to make addresses less
predictable.
The order of the code within the segment is unchanged - only the base
address is shifted. There are a few shortcomings to this algorithm.

1. Low Entropy - there are only so many locations the kernel can fit in.
   This means an attacker could guess without too much trouble.
2. Knowledge of a single address can reveal the offset of the base address,
   exposing all other locations for a published/known kernel image.
3. Info leaks abound.

Finer grained ASLR has been proposed as a way to make ASLR more resistant
to info leaks. It is not a new concept at all, and there are many
variations possible. Function reordering is an implementation of finer
grained ASLR which randomizes the layout of an address space on a function
level granularity. We use the term "fgkaslr" in this document to refer to
the technique of function reordering when used with KASLR, as well as finer
grained KASLR in general.

Proposed Improvement
--------------------
This patch set proposes adding function reordering on top of the existing
KASLR base address randomization. The over-arching objective is incremental
improvement over what we already have. It is designed to work in
combination with the existing solution. The implementation is really pretty
simple, and there are 2 main area where changes occur:

* Build time

GCC has had an option to place functions into individual .text sections for
many years now. This option can be used to implement function reordering at
load time. The final compiled vmlinux retains all the section headers,
which can be used to help find the address ranges of each function. Using
this information and an expanded table of relocation addresses, individual
text sections can be suffled immediately after decompression. Some data
tables inside the kernel that have assumptions about order require
re-sorting after being updated when applying relocations. In order to
modify these tables, a few key symbols are excluded from the objcopy symbol
stripping process for use after shuffling the text segments.

Some highlights from the build time changes to look for:

The top level kernel Makefile was modified to add the gcc flag if it
is supported. Currently, I am applying this flag to everything it is
possible to randomize. Anything that is written in C and not present in a
special input section is randomized. The final binary segment 0 retains a
consolidated .text section, as well as all the individual .text.* sections.
Future work could turn off this flags for selected files or even entire
subsystems, although obviously at the cost of security.

The relocs tool is updated to add relative relocations. This information
previously wasn't included because it wasn't necessary when moving the
entire .text segment as a unit.

A new file was created to contain a list of symbols that objcopy should
keep. We use those symbols at load time as described below.

* Load time

The boot kernel was modified to parse the vmlinux elf file after
decompression to check for our interesting symbols that we kept, and to
look for any .text.* sections to randomize. The consolidated .text section
is skipped and not moved. The sections are shuffled randomly, and copied
into memory following the .text section in a new random order. The existing
code which updated relocation addresses was modified to account for
not just a fixed delta from the load address, but the offset that the
function section was moved to. This requires inspection of each address to
see if it was impacted by a randomization. We use a bsearch to make this
less horrible on performance. Any tables that need to be modified with new
addresses or resorted are updated using the symbol addresses parsed from
the elf symbol table.

In order to hide our new layout, symbols reported through /proc/kallsyms
will be displayed in a random order.

Security Considerations
-----------------------
The objective of this patch set is to improve a technology that is already
merged into the kernel (KASLR). This code will not prevent all attacks,
but should instead be considered as one of several tools that can be used.
In particular, this code is meant to make KASLR more effective in the
presence of info leaks.

How much entropy we are adding to the existing entropy of standard KASLR
will depend on a few variables. Firstly and most obviously, the number of
functions that are randomized matters. This implementation keeps the
existing .text section for code that cannot be randomized - for example,
because it was assembly code. The less sections to randomize, the less
entropy. In addition, due to alignment (16 bytes for x86_64), the number
of bits in a address that the attacker needs to guess is reduced, as the
lower bits are identical.

Performance Impact
------------------
There are two areas where function reordering can impact performance: boot
time latency, and run time performance.

* Boot time latency
This implementation of finer grained KASLR impacts the boot time of the
kernel in several places. It requires additional parsing of the kernel ELF
file to obtain the section headers of the sections to be randomized. It
calls the random number generator for each section to be randomized to
determine that section's new memory location. It copies the decompressed
kernel into a new area of memory to avoid corruption when laying out the
newly randomized sections. It increases the number of relocations the
kernel has to perform at boot time vs. standard KASLR, and it also requires
a lookup on each address that needs to be relocated to see if it was in a
randomized section and needs to be adjusted by a new offset. Finally, it
re-sorts a few data tables that are required to be sorted by address.

Booting a test VM on a modern, well appointed system showed an increase in
latency of approximately 1 second.

* Run time
The performance impact at run-time of function reordering varies by
workload.
Using kcbench, a kernel compilation benchmark, the performance of a kernel
build with finer grained KASLR was about 1% slower than a kernel with
standard KASLR. Analysis with perf showed a slightly higher percentage of
L1-icache-load-misses. Other workloads were examined as well, with varied
results. Some workloads performed significantly worse under FGKASLR, while
others stayed the same or were mysteriously better. In general, it will
depend on the code flow whether or not finer grained KASLR will impact
your workload, and how the underlying code was designed. Because the layout
changes per boot, each time a system is rebooted the performance of a
workload may change.

Future work could identify hot areas that may not be randomized and either
leave them in the .text section or group them together into a single
section that may be randomized. If grouping things together helps, one
other thing to consider is that if we could identify text blobs that should
be grouped together to benefit a particular code flow, it could be
interesting to explore whether this security feature could be also be used
as a performance feature if you are interested in optimizing your kernel
layout for a particular workload at boot time. Optimizing function layout
for a particular workload has been researched and proven effective - for
more information read the Facebook paper "Optimizing Function Placement
for Large-Scale Data-Center Applications" (see references section below).

Image Size
----------
Adding additional section headers as a result of compiling with
-ffunction-sections will increase the size of the vmlinux ELF file.
With a standard distro config, the resulting vmlinux was increased by
about 3%. The compressed image is also increased due to the header files,
as well as the extra relocations that must be added. You can expect
fgkaslr to increase the size of the compressed image by about 15%.

Memory Usage
------------
fgkaslr increases the amount of heap that is required at boot time,
although this extra memory is released when the kernel has finished
decompression. As a result, it may not be appropriate to use this feature
on systems without much memory.

Building
--------
To enable fine grained KASLR, you need to have the following config options
set (including all the ones you would use to build normal KASLR)

CONFIG_FG_KASLR=y

In addition, fgkaslr is only supported for the X86_64 architecture.

Modules
-------
Modules are randomized similarly to the rest of the kernel by shuffling
the sections at load time prior to moving them into memory. The module
must also have been build with the -ffunction-sections compiler option.

Although fgkaslr for the kernel is only supported for the X86_64
architecture, it is possible to use fgkaslr with modules on other
architectures. To enable this feature, select

CONFIG_MODULE_FG_KASLR=y

This option is selected automatically for X86_64 when CONFIG_FG_KASLR is
set.

Disabling
---------
Disabling normal KASLR using the nokaslr command line option also disables
fgkaslr. It is also possible to disable fgkaslr separately by booting with
nofgkaslr on the commandline.

References
----------
There are a lot of academic papers which explore finer grained ASLR.
This paper in particular contributed the most to my implementation design
as well as my overall understanding of the problem space:

Selfrando: Securing the Tor Browser against De-anonymization Exploits,
M. Conti, S. Crane, T. Frassetto, et al.

For more information on how function layout impacts performance, see:

Optimizing Function Placement for Large-Scale Data-Center Applications,
G. Ottoni, B. Maher ([0]).

Alexander Lobakin:

Starting from v6, the project changed the main developer, please see
the changelog for details.

The actual revision has been compile-time and runtime tested on the
following setups with no issues:
- x86_64, GCC 11, Binutils 2.35;
- x86_64, Clang/LLVM 13, ClangLTO + ClangCFI (from Sami's tree).

Some numbers for comparison:

feat        make -j65 boot    vmlinux.o vmlinux  bzImage  bogoops/s
Relocatable 4m38.478s 24.440s 72014208  58579520  9396192 57640.39
KASLR       4m39.344s 24.204s 72020624  87805776  9740352 57393.80
FG-K 16 fps 6m16.493s 25.429s 83759856  87194160 10885632 57784.76
FG-K 8 fps  6m20.190s 25.094s 83759856  88741328 10985248 56625.84
FG-K 1 fps  7m09.611s 25.922s 83759856  95681128 11352192 56953.99

The legend:
* make -j65 -- the compilation time of a kernel tree with the named
  option enabled (and -j$(($(nproc) + 1))) (with the build machine
  running the same stock kernel for all entries), give to see mainly
  how linkers choke on big LD scripts;
* boot -- time elapsed from starting the kernel by the bootloader
  to login prompt, affected mostly by the main FG-KASLR preboot
  loop which shuffles function sections;
* vmlinux.o -- the size of the final vmlinux.o, altered by relocs
  and -ffunction-sections;
* vmlinux -- the size of the final vmlinux, depends directly on the
  number of (function) sections;
* bzImage -- the size of the final compressed kernel, same as with
  vmlinux;
* bogoops/s -- stress-ng -c$(nproc) results on the kernel with the
  named feature enabled;
* fps -- the number of functions per section, controlled by
  CONFIG_FG_KASLR_SHIFT and CONFIG_MODULE_FG_KASLR_SHIFT.
  16 fps means shift = 4, 8 fps on shift = 2, 1 fps for shift = 0.