Re: [PATCH v2 00/10] KFENCE: A low-overhead sampling-based memory safety error detector

Marco Elver <elver@xxxxxxxxxx> · Fri, 18 Sep 2020 13:59:15 +0200

On Fri, 18 Sep 2020 at 13:17, Qian Cai <cai@xxxxxxxxxx> wrote:
>
> On Tue, 2020-09-15 at 15:20 +0200, Marco Elver wrote:
> > This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
> > low-overhead sampling-based memory safety error detector of heap
> > use-after-free, invalid-free, and out-of-bounds access errors.  This
> > series enables KFENCE for the x86 and arm64 architectures, and adds
> > KFENCE hooks to the SLAB and SLUB allocators.
> >
> > KFENCE is designed to be enabled in production kernels, and has near
> > zero performance overhead. Compared to KASAN, KFENCE trades performance
> > for precision. The main motivation behind KFENCE's design, is that with
> > enough total uptime KFENCE will detect bugs in code paths not typically
> > exercised by non-production test workloads. One way to quickly achieve a
> > large enough total uptime is when the tool is deployed across a large
> > fleet of machines.
> >
> > KFENCE objects each reside on a dedicated page, at either the left or
> > right page boundaries. The pages to the left and right of the object
> > page are "guard pages", whose attributes are changed to a protected
> > state, and cause page faults on any attempted access to them. Such page
> > faults are then intercepted by KFENCE, which handles the fault
> > gracefully by reporting a memory access error.
> >
> > Guarded allocations are set up based on a sample interval (can be set
> > via kfence.sample_interval). After expiration of the sample interval,
> > the next allocation through the main allocator (SLAB or SLUB) returns a
> > guarded allocation from the KFENCE object pool. At this point, the timer
> > is reset, and the next allocation is set up after the expiration of the
> > interval.
> >
> > To enable/disable a KFENCE allocation through the main allocator's
> > fast-path without overhead, KFENCE relies on static branches via the
> > static keys infrastructure. The static branch is toggled to redirect the
> > allocation to KFENCE.
> >
> > The KFENCE memory pool is of fixed size, and if the pool is exhausted no
> > further KFENCE allocations occur. The default config is conservative
> > with only 255 objects, resulting in a pool size of 2 MiB (with 4 KiB
> > pages).
> >
> > We have verified by running synthetic benchmarks (sysbench I/O,
> > hackbench) that a kernel with KFENCE is performance-neutral compared to
> > a non-KFENCE baseline kernel.
> >
> > KFENCE is inspired by GWP-ASan [1], a userspace tool with similar
> > properties. The name "KFENCE" is a homage to the Electric Fence Malloc
> > Debugger [2].
> >
> > For more details, see Documentation/dev-tools/kfence.rst added in the
> > series -- also viewable here:
>
> Does anybody else grow tried of all those different *imperfect* versions of in-
> kernel memory safety error detectors? KASAN-generic, KFENCE, KASAN-tag-based
> etc. Then, we have old things like page_poison, SLUB debugging, debug_pagealloc
> etc which are pretty much inefficient to detect bugs those days compared to
> KASAN. Can't we work towards having a single implementation and clean up all
> those mess?

If you have suggestions on how to get a zero-overhead, precise
("perfect") memory safety error detector without new hardware
extensions, we're open to suggestions -- many people over many years
have researched this problems, and while we're making progress for C
(and C++), the fact remains that what you're asking is likely
impossible. This might be useful background:
https://arxiv.org/pdf/1802.09517.pdf

The fact remains that requirements and environments vary across
applications and usecases. Maybe for one usecase (debugging, test env)
normal KASAN is just fine. But that doesn't work for production, where
we want to have max performance.

MTE will get us closer (no silicon yet, and ARM64 only for now), but
depending on implementation might come with small overheads, although
quite acceptable for most environments with increasing processing
power modern CPUs deliver.

Yet for other environments, where even a small performance regression
is unacceptable, and where it's infeasible to capture in tests what
the workloads execute, KFENCE is a very attractive option.

There have also been discussions on using Rust in the kernel [1], but
this is just not feasible for core kernel code in the near future
(even then, you'll still need dynamic error detection tools for all
the unsafe bits, of which there are many in an OS kernel).
[1] https://lwn.net/Articles/829858/

Thanks,
-- Marco