On Tue, Nov 01, 2022 at 06:39:40PM +0100, Christoph Hellwig wrote: > On Tue, Nov 01, 2022 at 05:32:14PM +0000, Catalin Marinas wrote: > > There's also the case of low-end phones with all RAM below 4GB and arm64 > > doesn't allocate the swiotlb. Not sure those vendors would go with a > > recent kernel anyway. > > > > So the need for swiotlb now changes from 32-bit DMA to any DMA > > (non-coherent but we can't tell upfront when booting, devices may be > > initialised pretty late). Not only low-end phones, but there are other form-factors that can fall into this category and are also memory constrained (e.g. wearable devices), so the memory headroom impact from enabling SWIOTLB might be non-negligible for all of these devices. I also think it's feasible for those devices to use recent kernels. > > Yes. The other option would be to use the dma coherent pool for the > bouncing, which must be present on non-coherent systems anyway. But > it would require us to write a new set of bounce buffering routines. I think in addition to having to write new bounce buffering routines, this approach still suffers the same problem as SWIOTLB, which is that the memory for SWIOTLB and/or the dma coherent pool is not reclaimable, even when it is not used. There's not enough context in the DMA mapping routines to know if we need an atomic allocation, so if we used kmalloc(), instead of SWIOTLB, to dynamically allocate memory, it would always have to use GFP_ATOMIC. But what about having a pool that has a small amount of memory and is composed of several objects that can be used for small DMA transfers? If the amount of memory in the pool starts falling below a certain threshold, there can be a worker thread--so that we don't have to use GFP_ATOMIC--that can add more memory to the pool? --Isaac