On 2025-02-26 at 20:44:35 +0100, Andrey Konovalov wrote: >On Wed, Feb 26, 2025 at 5:43 PM Maciej Wieczor-Retman ><maciej.wieczor-retman@xxxxxxxxx> wrote: >> >> >What value can bit 63 and take for _valid kernel_ pointers (on which >> >KASAN is intended to operate)? If it is always 1, we could arguably >> >change the compiler to do | 0xFE for CompileKernel. Which would leave >> >us with only one region to check: [0xfe00000000000000, >> >0xffffffffffffffff]. But I don't know whether changing the compiler >> >makes sense: it technically does as instructed by the LAM spec. >> >(Vitaly, any thoughts? For context: we are discussing how to check >> >whether a pointer can be a result of a memory-to-shadow mapping >> >applied to a potentially invalid pointer in kernel HWASAN.) >> >> With LAM, valid pointers need to have bits 63 and 56 equal for 5 level paging >> and bits 63 and 47 equal for 4 level paging. Both set for kernel addresses and >> both clear for user addresses. > >Ah, OK. Then I guess we could even change to compiler to do | 0xFF, >same as arm. But I don't know if this makes sense. I guess it wouldn't be resetting the tag anymore, just some agreed upon set of bits. If this argument is just for the non_canonical_hook() purposes I suppose we can leave it as is and check the two ranges in the kernel. > >> >With the way the compiler works right now, for the perfectly precise >> >check, I think we need to check 2 ranges: [0xfe00000000000000, >> >0xffffffffffffffff] for when bit 63 is set (of a potentially-invalid >> >pointer to which memory-to-shadow mapping is to be applied) and >> >[0x7e00000000000000, 0x7fffffffffffffff] for when bit 63 is reset. Bit >> >56 ranges through [0, 1] in both cases. >> > >> >However, in these patches, you use only bits [60:57]. The compiler is >> >not aware of this, so it still sets bits [62:57], and we end up with >> >the same two ranges. But in the KASAN code, you only set bits [60:57], >> >and thus we can end up with 8 potential ranges (2 possible values for >> >each of the top 3 bits), which gets complicated. So checking only one >> >range that covers all of them seems to be reasonable for simplicity >> >even though not entirely precise. And yes, [0x1e00000000000000, >> >0xffffffffffffffff] looks like the what we need. >> >> Aren't the 2 ranges you mentioned in the previous paragraph still valid, no >> matter what bits the __tag_set() function uses? I mean bits 62:57 are still >> reset by the compiler so bits 62:61 still won't matter. For example addresses >> 0x1e00000000000000 and 0x3e00000000000000 will resolve to the same thing after >> the compiler is done with them right? > >Ah, yes, you're right, it's the same 2 ranges. > >I was thinking about the outline instrumentation mode, where the >shadow address would be calculated based on resetting only bits >[60:57]. But then there we have a addr_has_metadata() check in >kasan_check_range(), so KASAN should not try to deference a bad shadow >address and thus should not reach kasan_non_canonical_hook() anyway. Okay, so I guess we should do the same check for both arm64 and x86 right? (and risc-v in the future). Just use the wider range - in this case the 2 ranges that x86 needs. Then it could look something like: // 0xffffffffffffffff maps just below the shadow offset if (addr > KASAN_SHADOW_OFFSET || // and check below the most negative address (addr < kasan_mem_to_shadow(0xFE << 56) && // biggest positive address that overflows so check both above it addr > kasan_mem_to_shadow(~0UL >> 1)) || // smallest positive address but will overflow so check addresses below it addr < kasan_mem_to_shadow(0x7E << 56)) return so first two lines deal with the first range, and the next two lines deal with the second one. Or do you want me to make this part of non_canonical_hook() arch specific for maximum accuracy? -- Kind regards Maciej Wieczór-Retman