On Mon, 30 Mar 2020 at 16:28, Will Deacon <will@xxxxxxxxxx> wrote: > > On Mon, Mar 30, 2020 at 04:22:24PM +0200, Ard Biesheuvel wrote: > > On Mon, 30 Mar 2020 at 16:04, Will Deacon <will@xxxxxxxxxx> wrote: > > > > > > On Mon, Mar 30, 2020 at 03:53:04PM +0200, Ard Biesheuvel wrote: > > > > On Mon, 30 Mar 2020 at 15:51, Will Deacon <will@xxxxxxxxxx> wrote: > > > > > > > > > > On Sun, Mar 29, 2020 at 04:12:58PM +0200, Ard Biesheuvel wrote: > > > > > > When CONFIG_DEBUG_ALIGN_RODATA is enabled, kernel segments mapped with > > > > > > different permissions (r-x for .text, r-- for .rodata, rw- for .data, > > > > > > etc) are rounded up to 2 MiB so they can be mapped more efficiently. > > > > > > In particular, it permits the segments to be mapped using level 2 > > > > > > block entries when using 4k pages, which is expected to result in less > > > > > > TLB pressure. > > > > > > > > > > > > However, the mappings for the bulk of the kernel will use level 2 > > > > > > entries anyway, and the misaligned fringes are organized such that they > > > > > > can take advantage of the contiguous bit, and use far fewer level 3 > > > > > > entries than would be needed otherwise. > > > > > > > > > > > > This makes the value of this feature dubious at best, and since it is not > > > > > > enabled in defconfig or in the distro configs, it does not appear to be > > > > > > in wide use either. So let's just remove it. > > > > > > > > > > > > Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx> > > > > > > --- > > > > > > arch/arm64/Kconfig.debug | 13 ------------- > > > > > > arch/arm64/include/asm/memory.h | 12 +----------- > > > > > > drivers/firmware/efi/libstub/arm64-stub.c | 8 +++----- > > > > > > 3 files changed, 4 insertions(+), 29 deletions(-) > > > > > > > > > > Acked-by: Will Deacon <will@xxxxxxxxxx> > > > > > > > > > > But I would really like to go a step further and rip out the block mapping > > > > > support altogether so that we can fix non-coherent DMA aliases: > > > > > > > > > > https://lore.kernel.org/lkml/20200224194446.690816-1-hch@xxxxxx > > > > > > > > > > > > > I'm not sure I follow - is this about mapping parts of the static > > > > kernel Image for non-coherent DMA? > > > > > > Sorry, it's not directly related to your patch, just that if we're removing > > > options relating to kernel mappings then I'd be quite keen on effectively > > > forcing page-granularity on the linear map, as is currently done by default > > > thanks to RODATA_FULL_DEFAULT_ENABLED, so that we can nobble cacheable > > > aliases for non-coherent streaming DMA mappings by hooking into Christoph's > > > series above. > > > > > > > Right. I don't remember seeing any complaints about > > RODATA_FULL_DEFAULT_ENABLED, but maybe it's too early for that. > > > > > This series just reminded me of it because it's another > > > "off-by-default-behaviour-for-block-mappings-probably-because-of-performance- > > > but-never-actually-measured" type of thing which really just gets in the > > > way. > > > > > > > Well, even though I agree that the lack of actual numbers is a bit > > disturbing here, I'd hate to penalize all systems even more than they > > already are (due to ARCH_KMALLOC_MINALIGN == ARCH_DMA_MINALIGN) by > > adding another workaround that is only needed on devices that have > > non-coherent masters. > > Fair enough, but I'd still like to see some numbers. If they're compelling, > then we could explore something like CONFIG_OF_DMA_DEFAULT_COHERENT, but > that doesn't really help the kconfig maze :( > Could we make this a runtime thing? E.g., remap the entire linear region down to pages under stop_machine() the first time we probe a device that uses non-coherent DMA? (/me ducks)