On Tue, May 16, 2017 at 10:49 PM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote: > On Mon, May 15, 2017 at 11:23 PM, Joonsoo Kim <js1304@xxxxxxxxx> wrote: >>> > >>> > Hello, all. >>> > >>> > This is an attempt to recude memory consumption of KASAN. Please see >>> > following description to get the more information. >>> > >>> > 1. What is per-page shadow memory >>> >>> Hi Joonsoo, >> >> Hello, Dmitry. >> >>> >>> First I need to say that this is great work. I wanted KASAN to consume >> >> Thanks! >> >>> 1/8-th of _kernel_ memory rather than total physical memory for a long >>> time. >>> >>> However, this implementation does not work inline instrumentation. And >>> the inline instrumentation is the main mode for KASAN. Outline >>> instrumentation is merely a rudiment to support gcc 4.9, and it needs >>> to be removed as soon as we stop caring about gcc 4.9 (do we at all? >>> is it the current compiler in any distro? Ubuntu 12 has 4.8, Ubuntu 14 >>> already has 5.4. And if you build gcc yourself or get a fresher >>> compiler from somewhere else, you hopefully get something better than >>> 4.9). >> >> Hmm... I don't think that outline instrumentation is something to be >> removed. In embedded world, there is a fixed partition table and >> enlarging the kernel binary would cause the problem. Changing that >> table is possible but is really uncomfortable thing for debugging >> something. So, I think that outline instrumentation has it's own merit. > > Fair. Let's consider both as important. > >> Anyway, I have missed inline instrumentation completely. >> >> I will attach the fix in the bottom. It doesn't look beautiful >> since it breaks layer design (some check will be done at report >> function). However, I think that it's a good trade-off. > > > I can confirm that inline works with that patch. > > I can also confirm that it reduces memory usage. I've booted qemu with > 2G ram and run some fixed workload. Before: > 31853 dvyukov 20 0 3043200 765464 21312 S 366.0 4.7 2:39.53 > qemu-system-x86 > 7528 dvyukov 20 0 3043200 732444 21676 S 333.3 4.5 2:23.19 > qemu-system-x86 > After: > 6192 dvyukov 20 0 3043200 394244 20636 S 17.9 2.4 2:32.95 > qemu-system-x86 > 6265 dvyukov 20 0 3043200 388860 21416 S 399.3 2.4 3:02.88 > qemu-system-x86 > 9005 dvyukov 20 0 3043200 383564 21220 S 397.1 2.3 2:35.33 > qemu-system-x86 > > However, I see some very significant slowdowns with inline > instrumentation. I did 3 tests: > 1. Boot speed, I measured time for a particular message to appear on > console. Before: > [ 2.504652] random: crng init done > [ 2.435861] random: crng init done > [ 2.537135] random: crng init done > After: > [ 7.263402] random: crng init done > [ 7.263402] random: crng init done > [ 7.174395] random: crng init done > > That's ~3x slowdown. > > 2. I've run bench_readv benchmark: > https://raw.githubusercontent.com/google/sanitizers/master/address-sanitizer/kernel_buildbot/slave/bench_readv.c > as: > while true; do time ./bench_readv bench_readv 300000 1; done > > Before: > sys 0m7.299s > sys 0m7.218s > sys 0m6.973s > sys 0m6.892s > sys 0m7.035s > sys 0m6.982s > sys 0m6.921s > sys 0m6.940s > sys 0m6.905s > sys 0m7.006s > > After: > sys 0m8.141s > sys 0m8.077s > sys 0m8.067s > sys 0m8.116s > sys 0m8.128s > sys 0m8.115s > sys 0m8.108s > sys 0m8.326s > sys 0m8.529s > sys 0m8.164s > sys 0m8.380s > > This is ~19% slowdown. > > 3. I've run bench_pipes benchmark: > https://raw.githubusercontent.com/google/sanitizers/master/address-sanitizer/kernel_buildbot/slave/bench_pipes.c > as: > while true; do time ./bench_pipes 10 10000 1; done > > Before: > sys 0m5.393s > sys 0m6.178s > sys 0m5.909s > sys 0m6.024s > sys 0m5.874s > sys 0m5.737s > sys 0m5.826s > sys 0m5.664s > sys 0m5.758s > sys 0m5.421s > sys 0m5.444s > sys 0m5.479s > sys 0m5.461s > sys 0m5.417s > > After: > sys 0m8.718s > sys 0m8.281s > sys 0m8.268s > sys 0m8.334s > sys 0m8.246s > sys 0m8.267s > sys 0m8.265s > sys 0m8.437s > sys 0m8.228s > sys 0m8.312s > sys 0m8.556s > sys 0m8.680s > > This is ~52% slowdown. > > > This does not look acceptable to me. I would ready to pay for this, > say, 10% of performance. But it seems that this can have up to 2-4x > slowdown for some workloads. > > > Your use-case is embed devices where you care a lot about both code > size and memory consumption, right? > > I see 2 possible ways forward: > 1. Enable this new mode only for outline, but keep current scheme for > inline. Then outline will be "small but slow" type of configuration. > 2. Somehow fix slowness (at least in inline mode). > > >> Mapping zero page to non-kernel memory could cause true-negative >> problem since we cannot flush the TLB in all cpus. We will read zero >> shadow value value in this case even if actual shadow value is not >> zero. This is one of the reason that black page is introduced in this >> patchset. > > What does make your current patch work then? > Say we map a new shadow page, update the page shadow to say that there > is mapped shadow. Then another CPU loads the page shadow and then > loads from the newly mapped shadow. If we don't flush TLB, what makes > the second CPU see the newly mapped shadow? /\/\/\/\/\/\ Joonsoo, please answer this question above. I am trying to understand if there is any chance to make mapping a single page for all non-interesting shadow ranges work. That would be much simpler change that does not require changing instrumentation, and will not force inline instrumentation onto slow path for some ranges (vmalloc?). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>