On Thu, Mar 31, 2016 at 04:45:57PM +0200, Boris Brezillon wrote: > Hi Russell, > > On Thu, 31 Mar 2016 15:14:13 +0100 > Russell King - ARM Linux <linux@xxxxxxxxxxxxxxxx> wrote: > > > On Thu, Mar 31, 2016 at 02:29:42PM +0200, Boris Brezillon wrote: > > > sg_alloc_table_from_buf() provides an easy solution to create an sg_table > > > from a virtual address pointer. This function takes care of dealing with > > > vmallocated buffers, buffer alignment, or DMA engine limitations (maximum > > > DMA transfer size). > > > > Please note that the DMA API does not take account of coherency of memory > > regions other than non-high/lowmem - there are specific extensions to > > deal with this. > > Ok, you said 'non-high/lowmem', this means vmalloced and kmapped buffers > already fall in this case, right? > > Could you tell me more about those specific extensions? I was slightly confused - the extensions I was thinking of are those listed at the bottom of Documentation/cachetlb.txt, which have nothing to do with DMA. However, it's probably worth reading Documentation/DMA-API-HOWTO.txt to read up on what kinds of memory are considered to be DMA-able in the kernel. > > What this means is that having an API that takes any virtual address > > pointer, converts it to a scatterlist which is then DMA mapped, is > > unsafe. > > Which means some implementations already get this wrong (see > spi_map_buf(), and I'm pretty sure it's not the only one). Quite possible, but that is driver stuff, and driver stuff gets things wrong all the time. :) > > It'll be okay for PIPT and non-aliasing VIPT cache architectures, but > > for other cache architectures this will hide this problem and make > > review harder. > > > > Ok, you lost me. I'll have to do my homework and try to understand what > this means :). P = physical address V = virtual address I = indexed T = tag The tag is held in each cache line. When a location is looked up in the cache, an index is used to locate a set of cache lines and the tag is compared to check which cache line in the set is the correct one (or whether the address even exists in the cache.) How the index and tag are derived varies between cache architectures. PIPT = indexed by physical address, tagged with physical address. Never aliases with itself in the presence of multiple virtual mappings. VIPT = indexed by virtual address, tagged with physical address. If the bits from the virtual address do not overlap the MMU page size, it is also alias free, otherwise aliases can exist, but can be eliminated by "cache colouring" - ensuring that a physical address is always mapped with the same overlapping bits. VIVT = indexed by virtual address, tagged with virtual address. The worst kind of cache, since every different mapping of the same physical address is guaranteed by design to alias with other mappings. There is little cache colouring between different kernel mappings (eg, between lowmem and vmalloc space.) What this means is that, while the DMA API takes care of DMA aliases in the lowmem mappings, an alias-prone VIPT cache will remain incoherent with DMA if it is remapped into vmalloc space, and the mapping happens to have a different cache colour. In other words, this is a data corruption issue. Hence, taking a range of vmalloc() addresses, converting them into a scatterlist, then using the DMA API on the scatterlist _only_ guarantees that the lowmem (and kmap'd highmem mappings) are coherent with DMA. There is no way for the DMA API to know that other mappings exist, and obviously flushing every possible cache line just because a mapping might exist multiplies the expense of the DMA API: not only in terms of time spent running through all the possibilities, which doubles for every aliasing bit of VIPT, but also TLB pressure since you'd have to create a mapping for each alias and tear it back down. VIVT is even worse, since there is no other virtual mapping which is coherent, would need to be known, and each mapping would need to be individually flushed. -- RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html