On Wednesday 21 May 2014 14:56:36 Catalin Marinas wrote: > On Tue, May 20, 2014 at 08:37:10PM +0100, Arnd Bergmann wrote: > > On Tuesday 20 May 2014 15:12:49 Catalin Marinas wrote: > > > On Tue, May 20, 2014 at 12:08:58PM +0100, Arnd Bergmann wrote: > > > > On Tuesday 20 May 2014 12:02:46 Catalin Marinas wrote: > > > > > On Mon, May 19, 2014 at 05:55:56PM +0100, Arnd Bergmann wrote: > > > > > > On Monday 19 May 2014 16:56:08 Catalin Marinas wrote: > > > > > > > On Mon, May 19, 2014 at 10:44:51AM +0100, Arnd Bergmann wrote: > > > > > > > > On Monday 19 May 2014 10:03:40 Catalin Marinas wrote: > > > > > > We probably want to default to 32-bit for arm32 in the absence of dma-ranges. > > > > > > For arm64, I'd prefer if we could always mandate dma-ranges to be present > > > > > > for each bus, just like we mandate ranges to be present. > > > > > > I hope it's not too late for that. > > > > > > > > > > > > dma_set_mask should definitely look at the dma-ranges properties, and the > > > > > > helper that Santosh just introduced should give us all the information > > > > > > we need. We just need to decide on the correct behavior. > > > > > > > > > > Last time I looked at Santosh's patches I thought the dma-ranges is per > > > > > device rather than per bus. We could make it per bus only and let the > > > > > device call dma_set_mask() explicitly if it wants to restrict it > > > > > further. > > > > > > > > Can you check again? I've read the code again yesterday to check this, > > > > and I concluded it was correctly doing this per bus. > > > > > > You are right, I missed the fact that of_dma_get_range() checks the > > > dma-ranges property in the node's parent. > > > > > > So what we need is setting the default dma mask based on the size > > > information in dma-ranges to something like this: > > > > > > mask = rounddown_pow_of_two(size) - 1; > > > dma_set_mask(dev, mask); /* or dma_set_mask_and_coherent() */ > > > > I don't think we should be calling dma_set_mask here, since that would > > just go and parse the same property again once we fix the implementation. > > > > Since this is a low-level helper, we can probably just assign the dma mask > > directly. > > I was thinking of calling it in of_dma_configure() as that's where we > already set the default dma_mask and coherent_dma_mask. Default to > 32-bit if no dma-ranges are present. Right. Actually it should also be capped to 32-bit, to allow compatibility with drivers that don't call dma_set_mask and can't do 64-bit DMA. This is the normal behavior for PCI drivers. They need to set a 64-bit mask and check the result. > > > > > > > > While we currently don't have a set of swiotlb DMA ops on ARM32, we do > > > > > > > > have it on ARM64, and I think we should be using them properly. It should > > > > > > > > really not be hard to implement a proper dma_set_mask() function for > > > > > > > > ARM64 that gets is able to set up the swiotlb based on the dma-ranges > > > > > > > > properties and always returns success but leaves the mask unchanged. > > > > > > > > > > > > > > The swiotlb bounce buffer needs to be pre-allocated at boot, otherwise > > > > > > > we don't have any guarantees. Since we can't honour random masks anyway, > > > > > > > we stick to ZONE_DMA which is currently in the 4G limit. But the driver > > > > > > > calls dma_set_mask() too late for any further swiotlb setup. > > > > > > > > > > > > > > With IOMMU we can be more flexible around dma_set_mask(), can be done at > > > > > > > run-time. > > > > > > > > > > > > What we can do with swiotlb is to check if the mask is smaller than ZONE_DMA. > > > > > > If it ever is, we have to fail dma_set_mask and hope the driver can fall > > > > > > back to PIO mode or it will have to fail its probe() function. > > > > > > > > > > dma_set_(coherent_)mask check swiotlb_dma_supported() which returns > > > > > false if io_tlb_end goes beyond the device mask. So we just need to > > > > > ensure that io_tlb is allocated within ZONE_DMA. > > > > > > > > Makes sense for dma_set_mask. Why do you do the same thing for > > > > coherent_mask? Shouldn't that check against ZONE_DMA instead? > > > > > > It depends on the meaning of coherent_dma_mask. As I understand it, > > > that's the dma mask use for dma_alloc_coherent() to return a memory > > > buffer that the device is able to access. I don't see it much different > > > from the dma_mask used by the streaming API. I guess some hardware could > > > have different masks here if they have cache coherency only for certain > > > memory ranges (and the coherent_dma_mask would be smaller than dma_mask). > > > > The point I was trying to make is a different one: For the streaming mapping, > > you can allow any mask as long as the swiotlb is able to create a bounce > > buffer. > > dma_mask is a hardware parameter and is used by the streaming DMA API > implementation (which could be swiotlb) to decide whether to bounce or > not. The streaming DMA API can take any CPU address as input, > independent of the dma_mask, and bounce accordingly. If it doesn't have > a bounce buffer matching the dma_mask, dma_supported() would return > false. > > > This does not work for the coherent mapping, because that by definition > > cannot use bounce buffers. > > Yes but most of the time these masks have the same value. Let me start again with an example: io_tlb_end = 0x1000000; // 16 MB dma_mask = 0x10000000; // 256 MB ZONE_DMA = 0x100000000; // 4 GB max_pfn = 0x1000000000; // 64 GB The device driver has set dma_mask and dma_coherent_mask to 256 because of a limitation in the hardware. This succeeded because io_tlb_end is well within the dma mask. Since the coherent_dma_mask is smaller than max_pfn, the swiotlb version of dma_alloc_coherent then allocates the buffer using GFP_DMA. However, that likely returns an address above dma_mask/coherent_dma_mask. My point was that to fix this case, you must not compare the coherent_dma_mask requested by the device against io_tlb_end but against ZONE_DMA. > > > > > > For dma_set_coherent_mask(), we also have to fail any call that tries to > > > > > > set a mask larger than what the device hardware can do. Unlike that, > > > > > > dma_set_mask() can succeed with any mask, we just have to enable swiotlb > > > > > > if the mask that the driver wants is larger than what the hardware can > > > > > > do. > > > > > > > > > > Currently we can't satisfy any arbitrarily small dma mask even with > > > > > swiotlb since the bounce buffer is just guaranteed to be in ZONE_DMA. > > > > > Swiotlb allows for smaller masks but we need to reserve the io_tlb > > > > > buffer early during boot and at smaller addresses. For example, > > > > > swiotlb_alloc_coherent() first tries __get_free_pages(GFP_DMA) and if > > > > > the coherent_dma_mask isn't matched, it frees the pages and falls back > > > > > to the io_tlb buffer. However, I don't think it's worth going for masks > > > > > smaller than 32-bit on arm64. > > > > > > > > Is that safe for noncoherent systems? I'd expect the io_tlb buffer > > > > to be cached there, which means we can't use it for coherent allocations. > > > > > > For non-coherent systems, the current arm64 approach is to take the > > > address returned by swiotlb_alloc_coherent() and remap it as Normal > > > NonCacheable (see the arm64 __dma_alloc_noncoherent()). The CPU should > > > only access the dma buffer via the non-cacheable mapping. > > > > Doesn't that require the allocated page to be mapped using small pages > > in the linear mapping? > > Not in the linear mapping but in the vmalloc space (for arm64). Ok > > As far as I understand it, you are not allowed > > to create a noncacheable mapping for a page that also has a cachable > > mapping, or did that change between armv7 and armv8? > > No change between ARMv7 and ARMv8. In fact, ARMv7 has clarified what > "unpredictable" means when you mix different attributes. Basically, we > can mix the same memory type (Normal Cacheable vs Normal NonCacheable) > but if the cacheability is different, we need to do some cache > maintenance before accessing the non-cacheable mapping the first time > (clear potentially dirty lines that could be evicted randomly). Any > speculatively loaded cache lines via the cacheable mapping would not be > hit when accessed via the non-cacheable one. Ah, only for the first access? I had remembered this differently, thanks for the clarification. > > For CMA, we use the trick to change the mapping in place, but that > > doesn't work for normal pages, which are mapped using huge tlbs. > > That's true for arm32 and we could do the same for arm64, though I don't > think its worth because we could break very huge mappings (e.g. 1GB). We > have plenty of VA space and we just create a new non-coherent mapping > (this latter part could be optimised, even generically, to use huge > pages where possible). I thought we specifically did this on arm32 to avoid the duplicate mapping because we had concluded that it would be broken even with the updated ARMv7 definition. > > > Of course, another approach would be to change the cacheability > > > attributes of the io_tlb buffer (or the CMA one) but current approach > > > has the advantage that the io_tlb buffer can be used for both coherent > > > and non-coherent devices. > > > > That definitely shouldn't be done without performance testing. However > > I wonder if they were done in the first place for the swiotlb: It's > > possible that it's cheaper to have the bounce buffer for noncoherent > > devices be mapped noncachable so we do cache bypassing copy into the > > bounce buffer and out of it and save the extra flushes. > > I doesn't do this AFAICT but it's an interesting idea. Well, to be > optimised when we can benchmark this on hardware. > > One issue though is when some (or all) devices are coherent. In this > case, even if you use a bounce buffer, it needs to be mapped coherently > at the CPU level (e.g. the device has its own cache that can be snooped, > so CPU accesses need to be cacheable). Do you mean there are cases where an uncached buffer is not coherent with a device cache? Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html