On 2025-02-25 at 22:37:37 +0100, Andrey Konovalov wrote: >On Tue, Feb 25, 2025 at 6:16 PM Maciej Wieczor-Retman ><maciej.wieczor-retman@xxxxxxxxx> wrote: >> >> I mean in my tests, with setting offset in runtime, everything works correctly >> in inline mode. Even though hwasan-mapping-offset ends up empty and doesn't end >> up in CFLAGS_KASAN. I assume this means that the inline mode is pretty much the >> same as outline mode with the runtime offset setting? >> >> I also tested if hwasan-mapping-offset does anything if I passed random values >> to it by hardcoding them in the makefile and still everything seemed to work >> just fine. Therefore I assumed that this option doesn't have any effect on x86. > >Hm that's weird. I wonder if inline instrumentation somehow gets auto-disabled. > >> Hmm indeed it does. Then I'm not sure why I didn't crash when I started putting >> in random variables. I'll dive into assembly and see what's up in there. > >Please do, I'm curious what's going on there. I think I figured it out. After adding kasan_params += hwasan-instrument-with-calls=0 to Makefile.kasan just under kasan_params += hwasan-mapping-offset=$(KASAN_SHADOW_OFFSET) inline works properly in x86. I looked into assembly and before there were just calls to __hwasan_load/store. After adding the the hwasan-instrument-with-calls=0 I can see no calls and the KASAN offset is now inlined, plus all functions that were previously instrumented now have the kasan_check_range inlined in them. My LLVM investigation lead me to bool shouldInstrumentWithCalls(const Triple &TargetTriple) { return optOr(ClInstrumentWithCalls, TargetTriple.getArch() == Triple::x86_64); } which I assume defaults to "1" on x86? So even with inline mode it doesn't care and still does an outline version. I checked how arm64 reacts to adding the hwasan-instrument-with-calls=0 by cross compiling and I don't see any differences in output assembly. > >> But anyway I have an idea how to setup the x86 offset for tag-based mode so it >> works for both paging modes. I did some testing and value >> 0xffeffc0000000000 >> seems to work fine and has at least some of the benefits I was hoping for when >> doing the runtime_const thing. It works in both paging modes because in 5 levels >> it's just a little bit below the 0xffe0000000000000 that I was thinking about >> first and in 4 levels, because of LAM, it becomes 0xfffffc0000000000 (because in >> 4 level paging bits 62:48 are masked from address translation. So it's the same >> as the end of generic mode shadow memory space. >> >> The alignment doesn't fit the shadow memory size so it's not optimal but I'm not >> sure it can be if we want to have the inline mode and python scripts working at >> the same time. At the very least I think the KASAN_SHADOW_END won't collide with >> other things in the tab-based mode in 5 level paging mode, so no extra steps are >> needed (arch/x86/mm/kasan_init_64.c in kasan_init()). > >What do you mean by "The alignment doesn't fit the shadow memory size"? Maybe that's the wrong way to put it. I meant that KASAN_SHADOW_END and KASAN_SHADOW_END aren't aligned to the size of shadow memory. > >> Do you see any problems with this offset for x86 tag-based mode? > >I don't, but I think someone who understands the x86 memory layout >better needs to look at this. > >> Btw I think kasan_check_range() can be optimized on x86 if we use >> addr_has_metadata() that doesn't use KASAN_SHADOW_START. Getting rid of it from >> the implementation will remove pgtable_l5_enabled() which is pretty slow so >> kasan_check_range() which is called a lot would probably work much faster. >> Do you see any way in which addr_has_metadata() will make sense but won't use >> KASAN_SHADOW_START? Every one of my ideas ends up using pgtable_l5_enabled() >> because the metadata can have 6 or 15 bits depending on paging level. > >What if we turn pgtable_l5_enabled() into using a read-only static key >(DEFINE_STATIC_KEY_FALSE_RO) instead of a bool variable? Or if that is >not acceptable, we could cache its value in a KASAN-specific static >key. I think this was a false alarm, sorry. I asked Kirill about turning pgtable_l5_enabled() into a runtime_const value but it turns out it's already patched by alternative code during boot. I just saw a bunch more stuff there because I was looking at the assembly output and the code isn't patched there yet. -- Kind regards Maciej Wieczór-Retman