Re: [PATCH RFC 00/10] KFENCE: A low-overhead sampling-based memory safety error detector

Marco Elver <elver@xxxxxxxxxx> · Tue, 8 Sep 2020 17:31:02 +0200

On Tue, Sep 08, 2020 at 07:52AM -0700, Dave Hansen wrote:
> On 9/7/20 6:40 AM, Marco Elver wrote:
> > KFENCE is designed to be enabled in production kernels, and has near
> > zero performance overhead. Compared to KASAN, KFENCE trades performance
> > for precision. 
> 
> Could you talk a little bit about where you expect folks to continue to
> use KASAN?  How would a developer or a tester choose which one to use?

We mention some of this in Documentation/dev-tools/kfence.rst:

	In the kernel, several tools exist to debug memory access errors, and in
	particular KASAN can detect all bug classes that KFENCE can detect. While KASAN
	is more precise, relying on compiler instrumentation, this comes at a
	performance cost. We want to highlight that KASAN and KFENCE are complementary,
	with different target environments. For instance, KASAN is the better
	debugging-aid, where a simple reproducer exists: due to the lower chance to
	detect the error, it would require more effort using KFENCE to debug.
	Deployments at scale, however, would benefit from using KFENCE to discover bugs
	due to code paths not exercised by test cases or fuzzers.

If you can afford to use KASAN, continue using KASAN. Usually this only
applies to test environments. If you have kernels for production use,
and cannot enable KASAN for the obvious cost reasons, you could consider
KFENCE.

I'll try to make this clearer, maybe summarizing what I said here in
Documentation as well.

> > KFENCE objects each reside on a dedicated page, at either the left or
> > right page boundaries. The pages to the left and right of the object
> > page are "guard pages", whose attributes are changed to a protected
> > state, and cause page faults on any attempted access to them. Such page
> > faults are then intercepted by KFENCE, which handles the fault
> > gracefully by reporting a memory access error.
> 
> How much memory overhead does this end up having?  I know it depends on
> the object size and so forth.  But, could you give some real-world
> examples of memory consumption?  Also, what's the worst case?  Say I
> have a ton of worst-case-sized (32b) slab objects.  Will I notice?

KFENCE objects are limited (default 255). If we exhaust KFENCE's memory
pool, no more KFENCE allocations will occur.
Documentation/dev-tools/kfence.rst gives a formula to calculate the
KFENCE pool size:

	The total memory dedicated to the KFENCE memory pool can be computed as::

	    ( #objects + 1 ) * 2 * PAGE_SIZE

	Using the default config, and assuming a page size of 4 KiB, results in
	dedicating 2 MiB to the KFENCE memory pool.

Does that clarify this point? Or anything else that could help clarify
this?

Thanks,
-- Marco