On Thu, Oct 22, 2020 at 3:19 PM Andrey Konovalov <andreyknvl@xxxxxxxxxx> wrote: > > This patchset is not complete (hence sending as RFC), but I would like to > start the discussion now and hear people's opinions regarding the > questions mentioned below. > > === Overview > > This patchset adopts the existing hardware tag-based KASAN mode [1] for > use in production as a memory corruption mitigation. Hardware tag-based > KASAN relies on arm64 Memory Tagging Extension (MTE) [2] to perform memory > and pointer tagging. Please see [3] and [4] for detailed analysis of how > MTE helps to fight memory safety problems. > > The current plan is reuse CONFIG_KASAN_HW_TAGS for production, but add a > boot time switch, that allows to choose between a debugging mode, that > includes all KASAN features as they are, and a production mode, that only > includes the essentials like tag checking. > > It is essential that switching between these modes doesn't require > rebuilding the kernel with different configs, as this is required by the > Android GKI initiative [5]. > > The patch titled "kasan: add and integrate kasan boot parameters" of this > series adds a few new boot parameters: > > kasan.mode allows choosing one of main three modes: > > - kasan.mode=off - no checks at all > - kasan.mode=prod - only essential production features > - kasan.mode=full - all features > > Those mode configs provide default values for three more internal configs > listed below. However it's also possible to override the default values > by providing: > > - kasan.stack=off/on - enable stacks collection > (default: on for mode=full, otherwise off) > - kasan.trap=async/sync - use async or sync MTE mode > (default: sync for mode=full, otherwise async) > - kasan.fault=report/panic - only report MTE fault or also panic > (default: report) > > === Benchmarks > > For now I've only performed a few simple benchmarks such as measuring > kernel boot time and slab memory usage after boot. The benchmarks were > performed in QEMU and the results below exclude the slowdown caused by > QEMU memory tagging emulation (as it's different from the slowdown that > will be introduced by hardware and therefore irrelevant). > > KASAN_HW_TAGS=y + kasan.mode=off introduces no performance or memory > impact compared to KASAN_HW_TAGS=n. > > kasan.mode=prod (without executing the tagging instructions) introduces > 7% of both performace and memory impact compared to kasan.mode=off. > Note, that 4% of performance and all 7% of memory impact are caused by the > fact that enabling KASAN essentially results in CONFIG_SLAB_MERGE_DEFAULT > being disabled. > > Recommended Android config has CONFIG_SLAB_MERGE_DEFAULT disabled (I assume > for security reasons), but Pixel 4 has it enabled. It's arguable, whether > "disabling" CONFIG_SLAB_MERGE_DEFAULT introduces any security benefit on > top of MTE. Without MTE it makes exploiting some heap corruption harder. > With MTE it will only make it harder provided that the attacker is able to > predict allocation tags. > > kasan.mode=full has 40% performance and 30% memory impact over > kasan.mode=prod. Both come from alloc/free stack collection. > > === Questions > > Any concerns about the boot parameters? For boot parameters I think we are now "safe" in the sense that we provide maximum possible flexibility and can defer any actual decisions. > Should we try to deal with CONFIG_SLAB_MERGE_DEFAULT-like behavor mentioned > above? How hard it is to allow KASAN with CONFIG_SLAB_MERGE_DEFAULT? Are there any principal conflicts? The numbers you provided look quite substantial (on a par of what MTE itself may introduce). So I would assume if a vendor does not have CONFIG_SLAB_MERGE_DEFAULT disabled, it may not want to disable it because of MTE (effectively doubles overhead).