Re: [PATCH v4 00/17] khwasan: kernel hardware assisted address sanitizer

Nick Desaulniers <ndesaulniers@xxxxxxxxxx> · Fri, 6 Jul 2018 06:02:04 +0900

On Tue, Jul 3, 2018, 5:22 AM Evgenii Stepanov <eugenis@xxxxxxxxxx> wrote:
On Mon, Jul 2, 2018 at 12:21 PM, Andrew Morton

<akpm@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Mon, 2 Jul 2018 12:16:42 -0700 Evgenii Stepanov <eugenis@xxxxxxxxxx> wrote:

>

>> On Fri, Jun 29, 2018 at 7:41 PM, Andrew Morton

>> <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:

>> > On Fri, 29 Jun 2018 14:45:08 +0200 Andrey Konovalov <andreyknvl@xxxxxxxxxx> wrote:

>> >

>> >> >> What kind of memory consumption testing would you like to see?

>> >> >

>> >> > Well, 100kb or so is a teeny amount on virtually any machine.  I'm

>> >> > assuming the savings are (much) more significant once the machine gets

>> >> > loaded up and doing work?

>> >>

>> >> So with clean kernel after boot we get 40 kb memory usage. With KASAN

>> >> it is ~120 kb, which is 200% overhead. With KHWASAN it's 50 kb, which

>> >> is 25% overhead. This should approximately scale to any amounts of

>> >> used slab memory. For example with 100 mb memory usage we would get

>> >> +200 mb for KASAN and +25 mb with KHWASAN. (And KASAN also requires

>> >> quarantine for better use-after-free detection). I can explicitly

>> >> mention the overhead in %s in the changelog.

>> >>

>> >> If you think it makes sense, I can also make separate measurements

>> >> with some workload. What kind of workload should I use?

>> >

>> > Whatever workload people were running when they encountered problems

>> > with KASAN memory consumption ;)

>> >

>> > I dunno, something simple.  `find / > /dev/null'?

>> >

>>

>> Looking at a live Android device under load, slab (according to

>> /proc/meminfo) + kernel stack take 8-10% available RAM (~350MB).

>> Kasan's overhead of 2x - 3x on top of it is not insignificant.

>>

>

> (top-posting repaired.  Please don't)

>

> For a debugging, not-for-production-use feature, that overhead sounds

> quite acceptable to me.  What problems is it known to cause?

Not having this overhead enables near-production use - ex. running

kasan/khasan kernel on a personal, daily-use device to catch bugs that

do not reproduce in test configuration. These are the ones that often

cost the most engineering time to track down.

CPU overhead is bad, but generally tolerable. RAM is critical, in our

experience. Once it gets low enough, OOM-killer makes your life

miserable.

This would be great actually. It's hard internally to get testers to run KASAN builds on their daily devices. I would prefer even if we didn't ship in production, to at least have internal testers using this build, as we have great panic reporting/collection.