Hi Thierry, On Thursday, April 12, 2012 9:18 AM Thierry Reding wrote: > * Arnd Bergmann wrote: > > On Wednesday 11 April 2012, Thierry Reding wrote: > > > Daniel Vetter wrote: > > > > Well, you use the iommu api to map/unmap memory into the iommu for tegra, > > > > whereas usually device drivers just use the dma api to do that. The usual > > > > interface is dma_map_sg/dma_unmap_sg, but there are quite a few variants > > > > around. I'm just wondering why this you've choosen this. > > > > > > I don't think this works on ARM. Maybe I'm not seeing the whole picture but > > > judging by a quick look through the kernel tree there aren't any users that > > > map DMA memory through an IOMMU. > > > > dma_map_sg is certainly the right interface to use, and Marek Szyprowski has > > patches to make that work on ARM, hopefully going into v3.5, so you could > > use those. > > I've looked at Marek's patches but I don't think they'll work for Tegra 2 or > Tegra 3. The corresponding iommu_map() functions only set one PTE, regardless > of the number of bytes passed to them. However, the Tegra TRM indicates that > mapping needs to be done on a per-page basis so contiguous regions cannot be > combined. I suppose the IOMMU driver would have to be fixed to program more > than a single page in that case. I assume you want to map a set of pages into contiguous chunk in io address space. This can be done with dma_map_sg() call once IOMMU aware implementation has been assigned to the given device. DMA-mapping implementation is able to merge consecutive chunks of the scatter list in the dma/io address space if possible (i.e. there are no in-page offsets between the chunks). With my implementation of IOMMU aware dma-mapping you usually you get a single DMA chunk from the provided scatter-list. I know that this approach causes a lot of confusion at the first look, but that how dma mapping api has been designed. The scatter list based approach has some drawbacks - it is a bit oversized for most of the typical use cases for the gfx/multimedia buffers, but that's all we have now. Scatter lists were initially designed for the disk based block io operations, hence the presence of the in-page offsets and lengths for each chunk. For multimedia use cases providing an array of struct pages and asking dma-mapping to map them into contiguous memory is probably all we need. I wonder if introducing such new calls is a good idea. Anrd, what do think? It will definitely simplify the drivers and improve the code understanding. On the other hand it requires a significant amount of work in the dma-mapping framework for all architectures, but that's not a big issue for me. > Also this doesn't yet solve the vmap() problem that is needed for the kernel > virtual mapping. I did try using dma_alloc_writecombine(), but that only > works for chunks of 2 MB or smaller, unless I use init_consistent_dma_size() > during board setup, which isn't provided for in a DT setup. I couldn't find > a better alternative, but I admit I'm not very familiar with all the VM APIs. > Do you have any suggestions on how to solve this? Otherwise I'll try and dig > in some more. Yes, I'm aware of this issue I'm currently working on solving it. I hope to use standard vmalloc range for all coherent/writecombine allocations and get rid of the custom 'consistent_dma' region at all. Best regards -- Marek Szyprowski Samsung Poland R&D Center -- To unsubscribe from this list: send the line "unsubscribe linux-tegra" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html