On Wed, Apr 24, 2019 at 4:23 PM Christoph Hellwig <hch@xxxxxx> wrote: > > On Wed, Apr 24, 2019 at 12:45:56PM +0000, Gary Guo wrote: > > The RISC-V privileged spec is explicitly designed to allow the > > techniques described above (this is the sole purpose of MSTATUS.TVM). It > > might be as high performance as a hardware with H-extension, but is > > definitely a legit use case. In fact, it is vital for use cases like > > recursive virtualization. > > > > Also, I believe the PTE format of RISC-V is already frozen -- therefore > > it is impossible now to merge GLOBAL and USER bit, nor to replace RSW > > bit with another bit. > > Yes, I do not think we can just repurpose a bit. Even using a currently > unused one would require some gymnastics. > > That being said IFF we want to support non-coherent DMA (and I think we > do as people glue together their SOCs using shoestring and paper clips, > as already demonstrated by Andes and C-SKY in RISC-V space, and most > arm, mips and ppc SOCs) we need something like this flag. The current > RISC-V method that only allows M-mode to set up such attributes on > a small number or PMP regions just doesn't work well with the way how > Linux and most non-trivial OSes implement DMA memory allocations. > > Note that I said well - in theory we can have a firmware provided > uncached pool - that is what Linux does on most nommu (that is without > pagetables) ports, but the fixed sized pool really does suck and will > make users very unhappy. You could probably get away with allowing uncached mappings only for huge pages, and using one or two of the bits the PMD for it. This should cover most use cases, since in practice coherent allocations tend to be either small and rare (device descriptors) or very big (frame buffer etc), and both cases can be handled with hugepages and gen_pool_alloc, possibly CMA added in since there will likely not be an IOMMU either on the systems that lack cache coherent DMA. One downside is that you need a little more care for drivers that use dma_mmap_coherent() to expose coherent buffers to user space. Two other points about the proposal: - Aside from completely uncached/unbuffered mappings, you typically want uncached/buffered mappings to cover dma_alloc_wc() that is typically used for frame buffers etc that need write-combining to get acceptable performance - you need to decide what is supposed to happen when there are multiple conflicting mappings for the same physical address. Arnd