Re: [PATCH RFC 1/3] kfence: Add a module parameter to adjust kfence objects

Dmitry Vyukov <dvyukov@xxxxxxxxxx> · Mon, 24 Jan 2022 12:32:45 +0100

On Mon, 24 Jan 2022 at 12:24, liupeng (DM) <liupeng256@xxxxxxxxxx> wrote:
>
>
> On 2022/1/24 16:19, Marco Elver wrote:
>
> On Mon, Jan 24, 2022 at 02:52AM +0000, Peng Liu wrote:
>
> KFENCE is designed to be enabled in production kernels, but it can
> be also useful in some debug situations. For machines with limited
> memory and CPU resources, KASAN is really hard to run. Fortunately,
>
> If these are arm64 based machines, see if CONFIG_KASAN_SW_TAGS works for
> you. In future, we believe that CONFIG_KASAN_HW_TAGS will be suitable
> for a variety of scenarios, including debugging scenarios of resource
> constrained environments.
>
> Thank you for your good suggestion, we will try it.
>
> KFENCE can be a suitable candidate. For KFENCE running on a single
> machine, the possibility of discovering existed bugs will increase
> as the increasing of KFENCE objects, but this will cost more memory.
> In order to balance the possibility of discovering existed bugs and
> memory cost, KFENCE objects need to be adjusted according to memory
> resources for a compiled kernel Image. Add a module parameter to
> adjust KFENCE objects will make kfence to use in different machines
> with the same kernel Image.
>
> In short, the following reasons motivate us to add this parameter.
> 1) In some debug situations, this will make kfence flexible.
> 2) For some production machines with different memory and CPU size,
> this will reduce the kernel-Image-version burden.
>
> [...]
>
> This patch (of 3):
>
> [ Note for future: No need to add "This patch (of X)" usually -- this is
>   added by maintainers if deemed appropriate, and usually includes the
>   cover letter. ]
>
> The most important motivation of this patch series is to make
> KFENCE easy-to-use in business situations.
>
> Signed-off-by: Peng Liu <liupeng256@xxxxxxxxxx>
> ---
>  Documentation/dev-tools/kfence.rst |  14 ++--
>  include/linux/kfence.h             |   3 +-
>  mm/kfence/core.c                   | 108 ++++++++++++++++++++++++-----
>  mm/kfence/kfence.h                 |   2 +-
>  mm/kfence/kfence_test.c            |   2 +-
>  5 files changed, 103 insertions(+), 26 deletions(-)
>
> [...]
>
> diff --git a/include/linux/kfence.h b/include/linux/kfence.h
> index 4b5e3679a72c..aec4f6b247b5 100644
> --- a/include/linux/kfence.h
> +++ b/include/linux/kfence.h
> @@ -17,12 +17,13 @@
>  #include <linux/atomic.h>
>  #include <linux/static_key.h>
>
> +extern unsigned long kfence_num_objects;
>  /*
>   * We allocate an even number of pages, as it simplifies calculations to map
>   * address to metadata indices; effectively, the very first page serves as an
>   * extended guard page, but otherwise has no special purpose.
>   */
> -#define KFENCE_POOL_SIZE ((CONFIG_KFENCE_NUM_OBJECTS + 1) * 2 * PAGE_SIZE)
> +#define KFENCE_POOL_SIZE ((kfence_num_objects + 1) * 2 * PAGE_SIZE)
>  extern char *__kfence_pool;
>
> I appreciate the effort, but you could have gotten a quicker answer if
> you had first sent us an email to ask why adjustable number of objects
> hasn't been done before. Because if it was trivial, we would have
> already done it.
>
> What you've done is turned KFENCE_POOL_SIZE into a function instead of a
> constant (it still being ALL_CAPS is now also misleading).
>
> This is important here:
>
> /**
> * is_kfence_address() - check if an address belongs to KFENCE pool
> * @addr: address to check
> *
> * Return: true or false depending on whether the address is within the KFENCE
> * object range.
> *
> * KFENCE objects live in a separate page range and are not to be intermixed
> * with regular heap objects (e.g. KFENCE objects must never be added to the
> * allocator freelists). Failing to do so may and will result in heap
> * corruptions, therefore is_kfence_address() must be used to check whether
> * an object requires specific handling.
> *
> * Note: This function may be used in fast-paths, and is performance critical.
> * Future changes should take this into account; for instance, we want to avoid
> * introducing another load and therefore need to keep KFENCE_POOL_SIZE a
> * constant (until immediate patching support is added to the kernel).
> */
> static __always_inline bool is_kfence_address(const void *addr)
> {
> /*
> * The __kfence_pool != NULL check is required to deal with the case
> * where __kfence_pool == NULL && addr < KFENCE_POOL_SIZE. Keep it in
> * the slow-path after the range-check!
> */
> return unlikely((unsigned long)((char *)addr - __kfence_pool) < KFENCE_POOL_SIZE && __kfence_pool);
> }
>
> Unfortunately I think you missed the "Note".
>
> Which means that ultimately your patch adds another LOAD to the fast
> path, which is not an acceptable trade-off.
>
> This would mean your change would require benchmarking, but it'd also
> mean we and everyone else would have to re-benchmark _all_ systems where
> we've deployed KFENCE.
>
> I think the only reasonable way forward is if you add immediate patching
> support to the kernel as the "Note" suggests.
>
> May you give us more details about "immediate patching"?

Another option may be as follows:
Have a config for _max_ pool size. Always reserve max amount of
virtual address space, and do the range check for the max amount. But
actually allocate pages potentially for a smaller number of objects
(configured with a runtime parameter).

> In the meantime, while not a single kernel imagine, we've found that
> debug scenarios usually are best served with a custom debug kernel, as
> there are other debug features that are only Kconfig configurable. Thus,
> having a special debug kernel just configure KFENCE differently
> shouldn't be an issue in the majority of cases.
>
> Should this answer not be satisfying for you, the recently added feature
> skipping already covered allocations (configurable via
> kfence.skip_covered_thresh) alleviates some of the issue of a smaller
> pool with a very low sample interval (viz. high sample rate).
>
> The main thing to watch out for is KFENCE's actual sample rate vs
> intended sample rate (per kfence.sample_interval). If you monitor
> /sys/kernel/debug/kfence/stats, you can compute the actual sample rate.
> If the actual sample rate becomes significantly lower than the intended
> rate, only then does it make sense to increase the pool size. My
> suggestion for you is therefore to run some experiments, while adjusting
> kfence.sample_interval and kfence.skip_covered_thresh until you reach a
> sample rate that is close to intended.
>
> Thanks,
> -- Marco
> .
>
> Thank you for your patient suggestions, it's actually helpful and inspired.
> We have integrated your latest work "skipping already covered allocations",
> and will do more experiments about KFENCE. Finally, we really hope you can
> give us more introductions about "immediate patching".
>
> Thanks,
> -- Peng Liu
> .