Re: [PATCH v2 2/2] treewide: Add the __GFP_PACKED flag to several non-DMA kmalloc() allocations

Catalin Marinas <catalin.marinas@xxxxxxx> · Wed, 2 Nov 2022 13:10:53 +0000

On Tue, Nov 01, 2022 at 06:14:58PM +0000, Robin Murphy wrote:
> On 2022-11-01 17:19, Catalin Marinas wrote:
> > The bouncing currently is all or nothing with iommu_dma_map_sg(), unlike
> > dma_direct_map_sg() which ends up calling dma_direct_map_page() and we
> > can do the bouncing per element. So I was looking to untangle
> > iommu_dma_map_sg() in a similar way but postponed it as too complicated.
> > 
> > As a less than optimal solution, we can force bouncing for the whole
> > list if any of the sg elements is below the alignment size. Hopefully we
> > won't have many such mixed size cases.
> 
> Sounds like you may have got the wrong impression - the main difference with
> iommu_dma_map_sg_swiotlb() is that it avoids trying to do any of the clever
> concatenation stuff, and simply maps each segment individually with
> iommu_dma_map_page(), exactly like dma-direct; only segments which need
> bouncing actually get bounced.

You are right, the iommu_dma_map_page() is called for each element if
bouncing is needed. But without scanning the sg separately,
dev_use_swiotlb() would have to be true for all non-coherent devices to
force it through that path. As you said below, this would break some
use-cases.

> What sadly wouldn't work is just adding extra conditions to
> dev_use_swiotlb() to go down the existing bounce-if-necessary path for all
> non-coherent devices, since there are non-coherent users of dma-buf and v4l2
> which (for better or worse) depend on the clever concatenation stuff
> happening.

Would such cases have a length < ARCH_DMA_MINALIGN for any of the
scatterlist elements? If not, maybe scanning the list first would work,
though we probably do need a dma_flag to avoid scanning it again for
sync and unmap.

-- 
Catalin