Re: [PATCH RFC 00/10] KFENCE: A low-overhead sampling-based memory safety error detector

Marco Elver <elver@xxxxxxxxxx> · Tue, 8 Sep 2020 17:56:31 +0200

On Tue, Sep 08, 2020 at 05:36PM +0200, Vlastimil Babka wrote:
> On 9/8/20 5:31 PM, Marco Elver wrote:
> >> 
> >> How much memory overhead does this end up having?  I know it depends on
> >> the object size and so forth.  But, could you give some real-world
> >> examples of memory consumption?  Also, what's the worst case?  Say I
> >> have a ton of worst-case-sized (32b) slab objects.  Will I notice?
> > 
> > KFENCE objects are limited (default 255). If we exhaust KFENCE's memory
> > pool, no more KFENCE allocations will occur.
> > Documentation/dev-tools/kfence.rst gives a formula to calculate the
> > KFENCE pool size:
> > 
> > 	The total memory dedicated to the KFENCE memory pool can be computed as::
> > 
> > 	    ( #objects + 1 ) * 2 * PAGE_SIZE
> > 
> > 	Using the default config, and assuming a page size of 4 KiB, results in
> > 	dedicating 2 MiB to the KFENCE memory pool.
> > 
> > Does that clarify this point? Or anything else that could help clarify
> > this?
> 
> Hmm did you observe that with this limit, a long-running system would eventually
> converge to KFENCE memory pool being filled with long-aged objects, so there
> would be no space to sample new ones?

Sure, that's a possibility. But remember that we're not trying to
deterministically detect bugs on 1 system (if you wanted that, you
should use KASAN), but a fleet of machines! The non-determinism of which
allocations will end up in KFENCE, will ensure we won't end up with a
fleet of machines of identical allocations. That's exactly what we're
after. Even if we eventually exhaust the pool, you'll still detect bugs
if there are any.

If you are overly worried, either the sample interval or number of
available objects needs to be tweaked to be larger. The default of 255
is quite conservative, and even using something larger on a modern
system is hardly noticeable. Choosing a sample interval & number of
objects should also factor in how many machines you plan to deploy this
on. Monitoring /sys/kernel/debug/kfence/stats can help you here.

Thanks,
-- Marco