On Fri, Jan 24, 2025 at 10:42:56AM +0800, Furong Xu wrote: > On Thu, 23 Jan 2025 22:48:42 +0100, Andrew Lunn <andrew@xxxxxxx> wrote: > > > > Just to clarify, the patch that you had us try was not intended as an actual > > > fix, correct? It was only for diagnostic purposes, i.e. to see if there is > > > some kind of cache coherence issue, which seems to be the case? So perhaps > > > the only fix needed is to add dma-coherent to our device tree? > > > > That sounds quite error prone. How many other DT blobs are missing the > > property? If the memory should be coherent, i would expect the driver > > to allocate coherent memory. Or the driver needs to handle > > non-coherent memory and add the necessary flush/invalidates etc. > > stmmac driver does the necessary cache flush/invalidates to maintain cache lines > explicitly. > > See dma_sync_single_for_cpu(): > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/include/linux/dma-mapping.h#n297 > > dma_dev_need_sync() is supposed to return false for Tegra234, since the ethernet > controller on Tegra234 is dma coherent by SoC design as Brad said their > downstream device tree has dma-coherent turned on by default, and after add > dma-coherent to mainline ethernet node, stmmac driver works fine. > But dma-coherent property is missing in mainline Tegra234 ethernet device tree > node, dma_dev_need_sync() returns true and this is not the expected behavior. My understanding is that the Ethernet controller itself is not DMA coherent. Instead, it is by accessing memory through the IOMMU that the accesses become coherent. It's technically possible for the IOMMU to be disabled via command-line, or simply compile out the driver for it, and the devices are supposed to keep working with it (though it's not a recommended configuration), so exclusively relying on the presence of an IOMMU node or even a dma-coherent property is bound to fail in these corner cases. We don't currently have a good way of describing this in device tree, but a discussion with the DT maintainers is probably warranted. > The dma-coherent property in device tree node is SoC specific, so only the > vendors know if their stmmac ethernet controller is dma coherent and > whether their device tree are missing the critical dma-coherent property. What I fail to understand is how dma-coherent can make a difference in this case. If it's not present, then the driver is supposed to maintain caches explicitly. But it seems like explicit cache maintenance actually causes things to break. So do we need to assume that DMA coherent devices in generally won't work if the driver manages caches explicitly? I always figured dma-coherent was more of an optimization, but this seems to indicate it isn't. Thierry
Attachment:
signature.asc
Description: PGP signature