On Tue, Sep 29, 2020 at 05:51:58PM +0200, Alexander Potapenko wrote: > On Tue, Sep 29, 2020 at 4:24 PM Mark Rutland <mark.rutland@xxxxxxx> wrote: > > > > On Mon, Sep 21, 2020 at 03:26:02PM +0200, Marco Elver wrote: > > > From: Alexander Potapenko <glider@xxxxxxxxxx> > > > > > > This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a > > > low-overhead sampling-based memory safety error detector of heap > > > use-after-free, invalid-free, and out-of-bounds access errors. > > > > > > KFENCE is designed to be enabled in production kernels, and has near > > > zero performance overhead. Compared to KASAN, KFENCE trades performance > > > for precision. The main motivation behind KFENCE's design, is that with > > > enough total uptime KFENCE will detect bugs in code paths not typically > > > exercised by non-production test workloads. One way to quickly achieve a > > > large enough total uptime is when the tool is deployed across a large > > > fleet of machines. > > > > > > KFENCE objects each reside on a dedicated page, at either the left or > > > right page boundaries. The pages to the left and right of the object > > > page are "guard pages", whose attributes are changed to a protected > > > state, and cause page faults on any attempted access to them. Such page > > > faults are then intercepted by KFENCE, which handles the fault > > > gracefully by reporting a memory access error. To detect out-of-bounds > > > writes to memory within the object's page itself, KFENCE also uses > > > pattern-based redzones. The following figure illustrates the page > > > layout: > > > > > > ---+-----------+-----------+-----------+-----------+-----------+--- > > > | xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx | > > > | xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx | > > > | x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x | > > > | xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx | > > > | xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx | > > > | xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx | > > > ---+-----------+-----------+-----------+-----------+-----------+--- > > > > > > Guarded allocations are set up based on a sample interval (can be set > > > via kfence.sample_interval). After expiration of the sample interval, a > > > guarded allocation from the KFENCE object pool is returned to the main > > > allocator (SLAB or SLUB). At this point, the timer is reset, and the > > > next allocation is set up after the expiration of the interval. > > > > From other sub-threads it sounds like these addresses are not part of > > the linear/direct map. > For x86 these addresses belong to .bss, i.e. "kernel text mapping" > section, isn't that the linear map? No; the "linear map" is the "direct mapping" on x86, and the "image" or "kernel text mapping" is a distinct VA region. The image mapping aliases (i.e. uses the same physical pages as) a portion of the linear map, and every page in the linear map has a struct page. Fon the x86_64 ivirtual memory layout, see: https://www.kernel.org/doc/html/latest/x86/x86_64/mm.html Originally, the kernel image lived in the linear map, but it was split out into a distinct VA range (among other things) to permit KASLR. When that split was made, the x86 virt_to_*() helpers were updated to detect when they were passed a kernel image address, and automatically fix that up as-if they'd been handed the linear map alias of that address. For going one-way from virt->{phys,page} that works ok, but it doesn't survive the round-trip, and introduces redundant work into each virt_to_*() call. As it was largely arch code that was using image addresses, we didn't bother with the fixup on arm64, as we preferred the stronger warning. At the time I was also under the impression that on x86 they wanted to get rid of the automatic fixup, but that doesn't seem to have happened. Thanks, Mark.