On Fri, Dec 05, 2014 at 10:52:02AM +0100, Arnd Bergmann wrote: > I'm still puzzled why you'd need a single dma_sync_single_for_cpu() > after dma_alloc_coherent though, you should not need any. Is it possible > that the driver accidentally uses __raw_readl() instead of readl() > in some places and you are just lacking an appropriate barrier? Digging into the driver, it looks like individual DMA buffers are allocated (via brcmf_pcie_init_dmabuffer_for_device) and registered into a "commonring" layer. Whenever the buffer is written to, space is first allocated via a call to brcmf_commonring_reserve_for_write() or brcmf_commonring_reserve_for_write_multiple(), data written to the buffer, followed by a call to brcmf_commonring_write_complete(). brcmf_commonring_write_complete() calls two methods at that point: cr_write_wptr() and cr_ring_bell(), which will be brcmf_pcie_ring_mb_write_wptr() and brcmf_pcie_ring_mb_ring_bell(). The first calls brcmf_pcie_write_tcm16(), which uses iowrite16(), which contains the appropriate barrier. The bell ringing functions also use ioread*/iowrite*(). So, on the write side, it looks fine from the barrier perspective. On the read side, brcmf_commonring_get_read_ptr() is used before a read access to the ring - which calls the cr_update_wptr() method, which in turn uses an ioread16() call. After the CPU has read data from the ring, brcmf_commonring_read_complete() is used, which uses iowrite16(). So, I don't see a barrier problem on the read side. However, I did trip over this: static void * brcmf_pcie_init_dmabuffer_for_device(struct brcmf_pciedev_info *devinfo, u32 size, u32 tcm_dma_phys_addr, dma_addr_t *dma_handle) { void *ring; long long address; ring = dma_alloc_coherent(&devinfo->pdev->dev, size, dma_handle, GFP_KERNEL); if (!ring) return NULL; address = (long long)(long)*dma_handle; Casting to (long) will truncate the DMA handle to 32-bits on a 32-bit architecture, even if it supports 64-bit DMA addresses. There's a couple of other places where this same truncation occurs: address = (long long)(long)devinfo->shared.scratch_dmahandle; and address = (long long)(long)devinfo->shared.ringupd_dmahandle; In any case, wouldn't using a u64 type for "address" be better - isn't "long long" 128-bit on 64-bit architectures? -- FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html