On Thu, 2 Apr 2020 at 13:30, Catalin Marinas <catalin.marinas@xxxxxxx> wrote: > > On Mon, Mar 30, 2020 at 04:32:31PM +0200, Ard Biesheuvel wrote: > > On Mon, 30 Mar 2020 at 16:28, Will Deacon <will@xxxxxxxxxx> wrote: > > > > On Mon, 30 Mar 2020 at 16:04, Will Deacon <will@xxxxxxxxxx> wrote: > > > > > On Mon, Mar 30, 2020 at 03:53:04PM +0200, Ard Biesheuvel wrote: > > > > > > On Mon, 30 Mar 2020 at 15:51, Will Deacon <will@xxxxxxxxxx> wrote: > > > > > > > But I would really like to go a step further and rip out the block mapping > > > > > > > support altogether so that we can fix non-coherent DMA aliases: > > > > > > > > > > > > > > https://lore.kernel.org/lkml/20200224194446.690816-1-hch@xxxxxx > > > > > > > > > > > > I'm not sure I follow - is this about mapping parts of the static > > > > > > kernel Image for non-coherent DMA? > > > > > > > > > > Sorry, it's not directly related to your patch, just that if we're removing > > > > > options relating to kernel mappings then I'd be quite keen on effectively > > > > > forcing page-granularity on the linear map, as is currently done by default > > > > > thanks to RODATA_FULL_DEFAULT_ENABLED, so that we can nobble cacheable > > > > > aliases for non-coherent streaming DMA mappings by hooking into Christoph's > > > > > series above. > > Have we ever hit this issue in practice? At least from the CPU > perspective, we've assumed that a non-cacheable access would not hit in > the cache. Reading the ARM ARM rules, it doesn't seem to state this > explicitly but we can ask for clarification (I dug out an email from > 2015, left unanswered). > There is some wording in D4.4.5 (Behavior of caches at reset) that suggests that implementations may permit cache hits in regions that are mapped Non-cacheable (although the paragraph in question talks about global controls and not page table attributes) > Assuming that the CPU is behaving as we'd expect, are there other issues > with peripherals/SMMU? > There is the NoSnoop PCIe issue as well: PCIe masters that are DMA coherent in general can generate transactions with non-cacheable attributes. I guess this is mostly orthogonal, but I'm sure it would be much easier to reason about correctness if it is guaranteed that no mappings with mismatched attributes exist anywhere. > > > Fair enough, but I'd still like to see some numbers. If they're compelling, > > > then we could explore something like CONFIG_OF_DMA_DEFAULT_COHERENT, but > > > that doesn't really help the kconfig maze :( > > I'd prefer not to have a config option, we could easily break single > Image at some point. > > > Could we make this a runtime thing? E.g., remap the entire linear > > region down to pages under stop_machine() the first time we probe a > > device that uses non-coherent DMA? > > That could be pretty expensive at run-time. With the ARMv8.4-TTRem > feature, I wonder whether we could do this lazily when allocating > non-coherent DMA buffers. > > (I still hope there isn't a problem at all with this mismatch ;)). > Now that we have the pieces to easily remap the linear region down to pages, and [apparently] some generic infrastructure to manage the linear aliases, the only downside is the alleged performance hit resulting from increased TLB pressure. This is obviously highly micro-architecture dependent, but with Xgene1 and ThunderX1 out of the picture, I wonder if the tradeoffs are different now. Maybe by now, we should just suck it up (Note that we had no complaints afaik regarding the fact that we map the linear map down to pages by default now)