Hi Arnd, On Thu, Apr 25, 2019 at 11:50:11AM +0200, Arnd Bergmann wrote: > On Wed, Apr 24, 2019 at 4:23 PM Christoph Hellwig <hch@xxxxxx> wrote: > > > > On Wed, Apr 24, 2019 at 12:45:56PM +0000, Gary Guo wrote: > > > The RISC-V privileged spec is explicitly designed to allow the > > > techniques described above (this is the sole purpose of MSTATUS.TVM). It > > > might be as high performance as a hardware with H-extension, but is > > > definitely a legit use case. In fact, it is vital for use cases like > > > recursive virtualization. > > > > > > Also, I believe the PTE format of RISC-V is already frozen -- therefore > > > it is impossible now to merge GLOBAL and USER bit, nor to replace RSW > > > bit with another bit. > > > > Yes, I do not think we can just repurpose a bit. Even using a currently > > unused one would require some gymnastics. > > > > That being said IFF we want to support non-coherent DMA (and I think we > > do as people glue together their SOCs using shoestring and paper clips, > > as already demonstrated by Andes and C-SKY in RISC-V space, and most > > arm, mips and ppc SOCs) we need something like this flag. The current > > RISC-V method that only allows M-mode to set up such attributes on > > a small number or PMP regions just doesn't work well with the way how > > Linux and most non-trivial OSes implement DMA memory allocations. > > > > Note that I said well - in theory we can have a firmware provided > > uncached pool - that is what Linux does on most nommu (that is without > > pagetables) ports, but the fixed sized pool really does suck and will > > make users very unhappy. > > You could probably get away with allowing uncached mappings only > for huge pages, and using one or two of the bits the PMD for it. > This should cover most use cases, since in practice coherent allocations > tend to be either small and rare (device descriptors) or very big > (frame buffer etc), and both cases can be handled with hugepages > and gen_pool_alloc, possibly CMA added in since there will likely > not be an IOMMU either on the systems that lack cache coherent DMA. Generally attributs in huge-tlb-entry and leaf-tlb-entry should be the same. Only put _PAGE_CACHE and _PAGE_BUF bits in huge-tlb-entry sounds a bit strange. The gen_pool_alloc only 256KB by default, but a huge tlb entry is 4MB. Hardware couldn't setup vitual-4MB to a phys-256KB range mapping in TLB. > > One downside is that you need a little more care for drivers that > use dma_mmap_coherent() to expose coherent buffers to user space. > > Two other points about the proposal: > - Aside from completely uncached/unbuffered mappings, you typically > want uncached/buffered mappings to cover dma_alloc_wc() that is > typically used for frame buffers etc that need write-combining to get > acceptable performance I agree dma_alloc_wc is necessary, and we need add another more attribute bit in PTE: _PAGE_BUF. Perhaps using _PAGE_BUF + _PAGE_CACHE are better then _PAGE_CONHENCY. > - you need to decide what is supposed to happen when there are > multiple conflicting mappings for the same physical address. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ What's the mulitple confilcing mappings ? Best Regards Guo Ren