Re: [PATCH] usb: ehci: Enable support for 64bit EHCI host controllers in arm64

Catalin Marinas <catalin.marinas@xxxxxxx> · Tue, 20 May 2014 15:12:49 +0100

On Tue, May 20, 2014 at 12:08:58PM +0100, Arnd Bergmann wrote:
> On Tuesday 20 May 2014 12:02:46 Catalin Marinas wrote:
> > On Mon, May 19, 2014 at 05:55:56PM +0100, Arnd Bergmann wrote:
> > > On Monday 19 May 2014 16:56:08 Catalin Marinas wrote:
> > > > On Mon, May 19, 2014 at 10:44:51AM +0100, Arnd Bergmann wrote:
> > > > > On Monday 19 May 2014 10:03:40 Catalin Marinas wrote:
> > > We probably want to default to 32-bit for arm32 in the absence of dma-ranges.
> > > For arm64, I'd prefer if we could always mandate dma-ranges to be present
> > > for each bus, just like we mandate ranges to be present.
> > > I hope it's not too late for that.
> > > 
> > > dma_set_mask should definitely look at the dma-ranges properties, and the
> > > helper that Santosh just introduced should give us all the information
> > > we need. We just need to decide on the correct behavior.
> > 
> > Last time I looked at Santosh's patches I thought the dma-ranges is per
> > device rather than per bus. We could make it per bus only and let the
> > device call dma_set_mask() explicitly if it wants to restrict it
> > further.
> 
> Can you check again? I've read the code again yesterday to check this,
> and I concluded it was correctly doing this per bus.

You are right, I missed the fact that of_dma_get_range() checks the
dma-ranges property in the node's parent.

So what we need is setting the default dma mask based on the size
information in dma-ranges to something like this:

	mask = rounddown_pow_of_two(size) - 1;
	dma_set_mask(dev, mask);	/* or dma_set_mask_and_coherent() */

> > > > > While we currently don't have a set of swiotlb DMA ops on ARM32, we do
> > > > > have it on ARM64, and I think we should be using them properly. It should
> > > > > really not be hard to implement a proper dma_set_mask() function for
> > > > > ARM64 that gets is able to set up the swiotlb based on the dma-ranges
> > > > > properties and always returns success but leaves the mask unchanged.
> > > > 
> > > > The swiotlb bounce buffer needs to be pre-allocated at boot, otherwise
> > > > we don't have any guarantees. Since we can't honour random masks anyway,
> > > > we stick to ZONE_DMA which is currently in the 4G limit. But the driver
> > > > calls dma_set_mask() too late for any further swiotlb setup.
> > > > 
> > > > With IOMMU we can be more flexible around dma_set_mask(), can be done at
> > > > run-time.
> > > 
> > > What we can do with swiotlb is to check if the mask is smaller than ZONE_DMA.
> > > If it ever is, we have to fail dma_set_mask and hope the driver can fall
> > > back to PIO mode or it will have to fail its probe() function.
> > 
> > dma_set_(coherent_)mask check swiotlb_dma_supported() which returns
> > false if io_tlb_end goes beyond the device mask. So we just need to
> > ensure that io_tlb is allocated within ZONE_DMA.
> 
> Makes sense for dma_set_mask. Why do you do the same thing for
> coherent_mask? Shouldn't that check against ZONE_DMA instead?

It depends on the meaning of coherent_dma_mask. As I understand it,
that's the dma mask use for dma_alloc_coherent() to return a memory
buffer that the device is able to access. I don't see it much different
from the dma_mask used by the streaming API. I guess some hardware could
have different masks here if they have cache coherency only for certain
memory ranges (and the coherent_dma_mask would be smaller than dma_mask).

> > > For dma_set_coherent_mask(), we also have to fail any call that tries to
> > > set a mask larger than what the device hardware can do. Unlike that,
> > > dma_set_mask() can succeed with any mask, we just have to enable swiotlb
> > > if the mask that the driver wants is larger than what the hardware can
> > > do.
> > 
> > Currently we can't satisfy any arbitrarily small dma mask even with
> > swiotlb since the bounce buffer is just guaranteed to be in ZONE_DMA.
> > Swiotlb allows for smaller masks but we need to reserve the io_tlb
> > buffer early during boot and at smaller addresses. For example,
> > swiotlb_alloc_coherent() first tries __get_free_pages(GFP_DMA) and if
> > the coherent_dma_mask isn't matched, it frees the pages and falls back
> > to the io_tlb buffer. However, I don't think it's worth going for masks
> > smaller than 32-bit on arm64.
> 
> Is that safe for noncoherent systems? I'd expect the io_tlb buffer
> to be cached there, which means we can't use it for coherent allocations.

For non-coherent systems, the current arm64 approach is to take the
address returned by swiotlb_alloc_coherent() and remap it as Normal
NonCacheable (see the arm64 __dma_alloc_noncoherent()). The CPU should
only access the dma buffer via the non-cacheable mapping.

Of course, another approach would be to change the cacheability
attributes of the io_tlb buffer (or the CMA one) but current approach
has the advantage that the io_tlb buffer can be used for both coherent
and non-coherent devices.

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html