On 2024-12-11 2:31 am, Baochen Qiang wrote:
On 12/11/2024 7:06 AM, Tim Harvey wrote:
On Mon, Dec 9, 2024 at 8:11 PM Christoph Hellwig <hch@xxxxxx> wrote:
On Mon, Dec 09, 2024 at 11:15:02AM -0800, Tim Harvey wrote:
After a lot of back and forth and investigation this is due to the
IMX8M SoC's not having an IOMMU thus swiotlb is being used and ath11k
is requesting some buffers that are too large for swiotlb to provide.
There is a specific patch which added the HAL_WBM2SW_RELEASE buffers
to cacheable memory that could be reverted to fix this but the concern
was that it would impact performance moving those buffers to
non-cacheable memory (there are three ~1MiB buffers being allocated):
commit d0e2523bfa9cb ("ath11k: allocate HAL_WBM2SW_RELEASE ring from
cacheable memory").
The combination of "buffers" and "swiotlb" sounds like Robin was right
below.
The chain of events as best I can tell are:
commit 6452f0a3d565 ("ath11k: allocate dst ring descriptors from
cacheable memory")
- Nov 12 2021 (made it into Linux 5.17)
- changes allocation of reo_dst rings to cacheable memory to allow
cached descriptor access to optimize CPU usage
- this is flawed because it uses virt_to_phys() to allocate cacheable
memory which does not work on systems with an IOMMU enabled or using
software IOMMU (swiotlb); this causes a kernel crash on client
association
And this is where it started to take a wrong turn, that everyhing
later basically made worse. If you have long living and potentially
large DMA allocations, you need to use dma_alloc_* interfaces.
5.17 already had dma_alloc_pages for quite a while which was and still is
the proper interface to use. For much older kernel you'd be stuck
with dma_alloc_noncoherent or dma_alloc_attrs with the right flag,
but even that would have been much better.
Christoph,
I'm not clear what you are suggesting be done here. Are you suggesting
that ath11k has been using the wrong mechanism by calling
dma_map_single for cached DMA buffers? I'm not all that familiar with
ath11k so I can't tell what buffers are considered long living.
those buffers are allocated when driver load and freed when driver unload, so IMO they are
long living.
The point is that if this driver wants a notion of "cached DMA buffers",
then it should allocate such buffers the proper way, not try to reinvent
it badly. That means using dma_alloc_pages(), or modern
dma_alloc_noncoherent() which is essentially the same thing but with the
dma_map_page() call automatically done for you as well.
Thanks,
Robin.