On Wed, Dec 11, 2024 at 9:24 PM Baochen Qiang <quic_bqiang@xxxxxxxxxxx> wrote: > > > > On 12/11/2024 9:44 PM, Robin Murphy wrote: > > On 2024-12-11 2:31 am, Baochen Qiang wrote: > >> > >> > >> On 12/11/2024 7:06 AM, Tim Harvey wrote: > >>> On Mon, Dec 9, 2024 at 8:11 PM Christoph Hellwig <hch@xxxxxx> wrote: > >>>> > >>>> On Mon, Dec 09, 2024 at 11:15:02AM -0800, Tim Harvey wrote: > >>>>> After a lot of back and forth and investigation this is due to the > >>>>> IMX8M SoC's not having an IOMMU thus swiotlb is being used and ath11k > >>>>> is requesting some buffers that are too large for swiotlb to provide. > >>>>> There is a specific patch which added the HAL_WBM2SW_RELEASE buffers > >>>>> to cacheable memory that could be reverted to fix this but the concern > >>>>> was that it would impact performance moving those buffers to > >>>>> non-cacheable memory (there are three ~1MiB buffers being allocated): > >>>>> commit d0e2523bfa9cb ("ath11k: allocate HAL_WBM2SW_RELEASE ring from > >>>>> cacheable memory"). > >>>> > >>>> The combination of "buffers" and "swiotlb" sounds like Robin was right > >>>> below. > >>>> > >>>>> The chain of events as best I can tell are: > >>>>> > >>>>> commit 6452f0a3d565 ("ath11k: allocate dst ring descriptors from > >>>>> cacheable memory") > >>>>> - Nov 12 2021 (made it into Linux 5.17) > >>>>> - changes allocation of reo_dst rings to cacheable memory to allow > >>>>> cached descriptor access to optimize CPU usage > >>>>> - this is flawed because it uses virt_to_phys() to allocate cacheable > >>>>> memory which does not work on systems with an IOMMU enabled or using > >>>>> software IOMMU (swiotlb); this causes a kernel crash on client > >>>>> association > >>>> > >>>> And this is where it started to take a wrong turn, that everyhing > >>>> later basically made worse. If you have long living and potentially > >>>> large DMA allocations, you need to use dma_alloc_* interfaces. > >>>> > >>>> 5.17 already had dma_alloc_pages for quite a while which was and still is > >>>> the proper interface to use. For much older kernel you'd be stuck > >>>> with dma_alloc_noncoherent or dma_alloc_attrs with the right flag, > >>>> but even that would have been much better. > >>> > >>> Christoph, > >>> > >>> I'm not clear what you are suggesting be done here. Are you suggesting > >>> that ath11k has been using the wrong mechanism by calling > >>> dma_map_single for cached DMA buffers? I'm not all that familiar with > >>> ath11k so I can't tell what buffers are considered long living. > >> > >> those buffers are allocated when driver load and freed when driver unload, so IMO they are > >> long living. > > > > The point is that if this driver wants a notion of "cached DMA buffers", then it should > > allocate such buffers the proper way, not try to reinvent it badly. That means using > > dma_alloc_pages(), or modern dma_alloc_noncoherent() which is essentially the same thing > > but with the dma_map_page() call automatically done for you as well. > > yeah, you are right, Robin. didn't know there are convenient interfaces like these already. > > Tim, can you work out a patch then? > How about: diff --git a/drivers/net/wireless/ath/ath11k/dp.c b/drivers/net/wireless/ath/ath11k/dp.c index fbf666d0ecf1..557e06187e95 100644 --- a/drivers/net/wireless/ath/ath11k/dp.c +++ b/drivers/net/wireless/ath/ath11k/dp.c @@ -105,9 +105,8 @@ void ath11k_dp_srng_cleanup(struct ath11k_base *ab, struct dp_srng *ring) return; if (ring->cached) { - dma_unmap_single(ab->dev, ring->paddr_unaligned, ring->size, - DMA_FROM_DEVICE); - kfree(ring->vaddr_unaligned); + dma_free_noncoherent(ab->dev, ring->size, ring->vaddr_unaligned, + ring->paddr_unaligned, DMA_FROM_DEVICE); } else { dma_free_coherent(ab->dev, ring->size, ring->vaddr_unaligned, ring->paddr_unaligned); @@ -249,28 +248,18 @@ int ath11k_dp_srng_setup(struct ath11k_base *ab, struct dp_srng *ring, default: cached = false; } - - if (cached) { - ring->vaddr_unaligned = kzalloc(ring->size, GFP_KERNEL); - if (!ring->vaddr_unaligned) - return -ENOMEM; - - ring->paddr_unaligned = dma_map_single(ab->dev, - ring->vaddr_unaligned, - ring->size, - DMA_FROM_DEVICE); - if (dma_mapping_error(ab->dev, ring->paddr_unaligned)) { - kfree(ring->vaddr_unaligned); - ring->vaddr_unaligned = NULL; - return -ENOMEM; - } - } } - if (!cached) + if (cached) { + ring->vaddr_unaligned = dma_alloc_noncoherent(ab->dev, ring->size, + &ring->paddr_unaligned, + DMA_FROM_DEVICE, + GFP_KERNEL); + } else { ring->vaddr_unaligned = dma_alloc_coherent(ab->dev, ring->size, &ring->paddr_unaligned, GFP_KERNEL); + } if (!ring->vaddr_unaligned) return -ENOMEM; If this is what we are talking about here I can submit that with a proper commit log. Note there are a lot of other calls to dma_map_single in the ath drivers and my understanding is those may be just fine for small short-lived buffers but I'm not clear if that is what they are always used for. Best Regards, Tim > > > > Thanks, > > Robin. >