On Tue, Sep 08, 2009 at 01:27:49PM -0500, James Bottomley wrote: > This bug was observed on parisc, but I would expect it to affect all > architectures with virtually indexed caches. I don't think your proposed solution will work for ARM with speculative prefetching (iow, the latest ARM CPUs.) If there is a mapping present, it can be speculatively prefetched from at any time - the CPU designers have placed no bounds on the amount of speculative prefetching which may be present in a design. What this means that for DMA, we will need to handle cache coherency issues both before and after DMA. If we're going to allow non-direct mapped (offset mapped in your parlence) block IO, it makes it impossible to handle cache coherency after DMA completion - although we can translate (via page table walks) from a virtual address to a physical, and then to a bus address for DMA, going back the other way is impossible since there could be many right answers. What has been annoying me for a while about the current DMA API is that drivers have to carry around all sorts of information for a DMA mapping, whether the architecture needs it or not - and sometimes that information is not what the architecture wants. To this end, I've been thinking that something more like: struct dma_mapping map; err = dma2_map_single(&map, buffer, size, direction); if (err) ... addr = dma2_addr(&map); /* program controller */ /* completion */ dma2_unmap_single(&map); with similar style interfaces for pages and so forth (scatterlists are already arch-defined.) Architectures define the contents of struct dma_mapping - but it must contain at least the dma address. What's the advantage of this? It means that if an architecture needs to handle cache issues after DMA on unmap via a virtual address, it can ensure that the correct address is passed through all the way to the unmap function. This approach also relieves the driver writer from having to carry around the direction, size and dma address themselves, which means we don't need the DMA debug infrastructure to check that drivers are doing these things correctly. I seriously doubt, though, that we can revise the DMA API... In your (and my) case, maybe struct scatterlist also needs to contain the virtual address as well as the struct page, offset and length? PS, ARM already does not allow anything but direct-mapped RAM addresses for dma_map_single(), since we need to be able to translate virtual addresses to physical for non-coherent L2 cache handling - L1 cache needs handling via the virtual address and L2 via the physical address. PPS, you're not the only architecture which has problems with XFS. ARM has a long standing issue with it too. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html