Hi On 2016년 11월 10일 18:51, Brian Starkey wrote: > Hi Jaewon, > > On Thu, Nov 10, 2016 at 10:41:43AM +0900, Jaewon Kim wrote: >> Hi >> >> On 2016년 11월 09일 19:23, Brian Starkey wrote: >>> Hi, >>> >>> On Wed, Nov 09, 2016 at 06:47:26PM +0900, Jaewon Kim wrote: >>>> >>>> >>>> On 2016년 11월 09일 18:27, Brian Starkey wrote: >>>>> Hi Jaewon, >>>>> >>>>> On Wed, Nov 09, 2016 at 06:10:09PM +0900, Jaewon Kim wrote: >>>>>> Commit 6b03ae0d42bf (drivers: dma-coherent: use MEMREMAP_WC for DMA_MEMORY_MA) >>>>>> added MEMREMAP_WC for DMA_MEMORY_MAP. If, however, CPU cache can be used on >>>>>> DMA_MEMORY_MAP, I think MEMREMAP_WC can be changed to MEMREMAP_WB. On my local >>>>>> ARM device, memset in dma_alloc_from_coherent sometimes takes much longer with >>>>>> MEMREMAP_WC compared to MEMREMAP_WB. >>>>>> >>>>>> Test results on AArch64 by allocating 4MB with putting trace_printk right >>>>>> before and after memset. >>>>>> MEMREMAP_WC : 11.0ms, 5.7ms, 4.2ms, 4.9ms, 5.4ms, 4.3ms, 3.5ms >>>>>> MEMREMAP_WB : 0.7ms, 0.6ms, 0.6ms, 0.6ms, 0.6ms, 0.5ms, 0.4 ms >>>>>> >>>>> >>>>> This doesn't look like a good idea to me. The point of coherent memory >>>>> is to have it non-cached, however WB will make writes hit the cache. >>>>> >>>>> Writing to the cache is of course faster than writing to RAM, but >>>>> that's not what we want to do here. >>>>> >>>>> -Brian >>>>> >>>> Hi Brian >>>> >>>> Thank you for your comment. >>>> If allocated memory will be used by TZ side, however, I think cacheable >>>> also can be used to be fast on memset in dma_alloc_from_coherent. >>> >>> Are you trying to share the buffer between the secure and non-secure >>> worlds on the CPU? In that case, I don't think caching really helps >>> you. I'm not a TZ expert, but I believe the two worlds can never >>> share cached data. >> I do not want memory sharing between the secure and non-secure worlds. >> I just want faster allocation. > > So now I'm confused. Why did you mention TZ? > > Could you explain what your use-case for the buffer you are allocating > is? I wanted to send physically contiguous memory to TZapp size at Linux runtime. And during discussion I realized I need to consider more if dma-coherent is proper approach only for TZapp. But if another DMA master is joined, l think we can think this issue again. Like secure HW device get memory from dma-coherent and it passes to TZapp. > >> >> I am not a TZ expert, either. I also think they cannot share cached data. >> As far as I know secure world can decide its cache policy with secure >> world page table regardless of non-secure world. >>> If you want the secure world to see the non-secure world's data, as >>> far as I know you will need to clean the cache in the non-secure world >>> to make sure the secure world can see it (and vice-versa). I'd expect >>> this to remove most of the speed advantage of using WB in the first >>> place, except for some possible speedup from more efficient bursting. >> Yes I also think non-secure world need to clean the cache before secure world >> access the memory region to avoid invalid data issue. But if other software >> like Linux driver or hypervisor do the cache cleaning, or engineer confirm, >> then we may be able to use MEMREMAP_WB or just skipping memset for faster >> memory allocation in dma_alloc_from_coherent. > > Skipping the memset doesn't sound like a good plan, you'd potentially > be leaking information to whatever device receives the memory. And > adding WB mappings to an API which is intended to be used for sharing > buffers with DMA masters doesn't sound like a good move either. > >>> >>> If you're sharing the buffer with other DMA masters, regardless of >>> secure/non-secure you're not going to want WB mappings. >>> >> If there is a scenario where another DMA master works on this memory, >> an engineer, I think, need to consider cache clean if he/she uses WB. > > The whole point of dma-coherent memory is for use by DMA masters. > >>>> How do you think to add another flag to distinguish this case? >>> >>> You could look into the streaming DMA API. It will depend on the exact >>> implementation, but at some point you're still going to have to pay >>> the penalty of syncing the CPU and device. >>> >>> -Brian >>> >> I cannot find DMA API and flag for WB. So I am considering additional flag >> to meet my request. In my opinion the flag can be either WB or non-zeroing. > > To me, it sounds like dma-coherent isn't the right tool to achieve > what you want, but I'm not clear on exactly what it is you're trying > to do (I know you want faster allocations, the point is what for?) > I actually looking into enum dma_attr which has DMA_ATTR_SKIP_ZEROING. If dma_alloc_attrs in arch/arm64/include/asm/dma-mapping.h passes struct dma_attrs *attrs to dma_alloc_from_coherent, I think I can do what I want. Thank you for your comment. > -Brian > >> >> For case #1 - DMA_MEMORY_MAP_WB >> --- a/drivers/base/dma-coherent.c >> +++ b/drivers/base/dma-coherent.c >> @@ -32,7 +32,9 @@ static bool dma_init_coherent_memory( >> if (!size) >> goto out; >> >> - if (flags & DMA_MEMORY_MAP) >> + if (flags & DMA_MEMORY_MAP_WB) >> + mem_base = memremap(phys_addr, size, MEMREMAP_WB); >> + else if (flags & DMA_MEMORY_MAP) >> mem_base = memremap(phys_addr, size, MEMREMAP_WC); >> else >> mem_base = ioremap(phys_addr, size); >> >> For case #2 - DMA_MEMORY_MAP_NOZEROING >> --- a/drivers/base/dma-coherent.c >> +++ b/drivers/base/dma-coherent.c >> @@ -190,6 +190,8 @@ int dma_alloc_from_coherent(struct device *dev, ssize_t size, >> *ret = mem->virt_base + (pageno << PAGE_SHIFT); >> dma_memory_map = (mem->flags & DMA_MEMORY_MAP); >> spin_unlock_irqrestore(&mem->spinlock, flags); >> + if (mem->flags & DMA_MEMORY_MAP_NOZEROING) >> + return 1; >> if (dma_memory_map) >> memset(*ret, 0, size); >> else >> >> Can I get your comment? >> >> Thank you >> Jaewon Kim >> >>>>>> Signed-off-by: Jaewon Kim <jaewon31.kim@xxxxxxxxxxx> >>>>>> --- >>>>>> drivers/base/dma-coherent.c | 2 +- >>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/drivers/base/dma-coherent.c b/drivers/base/dma-coherent.c >>>>>> index 640a7e6..0512a1d 100644 >>>>>> --- a/drivers/base/dma-coherent.c >>>>>> +++ b/drivers/base/dma-coherent.c >>>>>> @@ -33,7 +33,7 @@ static bool dma_init_coherent_memory( >>>>>> goto out; >>>>>> >>>>>> if (flags & DMA_MEMORY_MAP) >>>>>> - mem_base = memremap(phys_addr, size, MEMREMAP_WC); >>>>>> + mem_base = memremap(phys_addr, size, MEMREMAP_WB); >>>>>> else >>>>>> mem_base = ioremap(phys_addr, size); >>>>>> if (!mem_base) >>>>>> -- >>>>>> 1.9.1 >>>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >>> >> > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>