On Sun, Feb 11, 2018 at 03:28:08AM -0800, Matthew Wilcox wrote: > Now, longer-term, perhaps we should do the following: > > #ifdef CONFIG_ZONE_DMA32 > #define OPT_ZONE_DMA32 ZONE_DMA32 > #elif defined(CONFIG_64BIT) > #define OPT_ZONE_DMA OPT_ZONE_DMA > #else > #define OPT_ZONE_DMA32 ZONE_NORMAL > #endif > > Then we wouldn't need the ifdef here and could always use GFP_DMA32 > | GFP_KERNEL. Would need to audit current users and make sure they > wouldn't be broken by such a change. Argh, I forgot to say the most important thing. (For those newly invited to the party, we're talking about drivers/media, in particular drivers/media/common/saa7146/saa7146_core.c, functions saa7146_vmalloc_build_pgtable and vmalloc_to_sg) I think we're missing a function in our DMA API. These drivers don't actually need physical memory below the 4GB mark. They need DMA addresses which are below the 4GB mark. For machines with IOMMUs, this can mean no restrictions on physical memory. If we don't have an IOMMU, then a bounce buffer could be used (but would be slow) -- like the swiotlb. So we should endeavour to allocate memory below the 4GB boundary on systems with no IOMMU, but can allocate memory anywhere on systems with an IOMMU. For consistent / coherent memory, we have an allocation function. But we don't have an allocation function for streaming memory, which is what these drivers want. They also flush the DMA memory and then access the memory through a different virtual mapping, which I'm not sure is going to work well on virtually-indexed caches like SPARC and PA-RISC (maybe not MIPS either?) I think we want something like struct scatterlist *dma_alloc_sg(struct device *dev, int *nents); void dma_free_sg(struct device *dev, struct scatterlist *sg, int nents); That lets individual architectures decide where to allocate, and handle the tradeoff between allocating below 4GB and using bounce buffers. I don't have a good answer to synchronising between device-view of memory and CPU-view-through-vmalloc though. They're already calling dma_sync_*_for_cpu(); do they need to also call a new vflush(void *p, unsigned long len) function which can be a no-op on x86 and flushes the range on SPARC/PA-RISC/... ?