On Wed, 2024-07-24 at 20:17 +0200, Mikulas Patocka wrote: > Hi > > Thanks for fixing the cache aliasing issues on PA-RISC in the commit > 72d95924ee35c8cd16ef52f912483ee938a34d49. > > I think there is still one problem left - and that is > ARCH_DMA_MINALIGN. Currently, it is 16, which is obviously wrong. I don't think that's obvious, why is it wrong? > > Some comments n the kernel say that PA8900 has L2 cache with 128-byte > line size, so I think that ARCH_DMA_MINALIGN should be 128 as well. The L2+ caches on PA88 and 89 systems are PIPT and fully coherent with the PCI bus, so the L2+ line size doesn't matter that much (well, except we could possibly get better performance with more judicious DMA alignment). All the parisc coherency protocols rely on the CPU L1 cache, which is still VIPT. Additionally, the CPU architects kept the minimum line size for the L1 at 16, so even in the later CPUs which have larger actual VIPT cache line sizes there's a splitting mechanism which means they can operate coherency protocols at a line size of 16. This was done so the only spinlock primitive parisc has (LDCW) can still operate correctly with only 16 bytes of alignment. > The question is - can the CPU speculatively mark a cache line as > dirty and write it back? No, the CPU may only mark a line as dirty if something actually wrote to it; it may not do it speculatively. The L1 cache can speculatively move in clean lines if a TLB exists for them and once a line is marked dirty it's within the gift of the CPU to decide when to write it back absent a flush. > If yes, we have a big problem - Linux assumes that a part of the > page may be used for DMA transfer and another part of that page may > be used for normal cacheable structures. If the PA-RISC CPU > speculatively prefetched and wrote back a cache line, it could > corrupt the DMA transfer. The L2 PIPT PCI coherence protocol ensures that DMA can't corrupt memory adjacent objects on PA88 and 89. Earlier CPUs, which were fully VIPT, do suffer from this problem because they have no PCI coherence, but they all operate at a line size of 16 anyway and so ARCH_DMA_MINALIGN works for them. > If the CPU doesn't speculatively mark cache lines as dirty, then > increasing ARCH_DMA_MINALIGN would be sufficient solution. Well, it's relatively safe to try without exploding all our hashed spinlocks because the LDCW alignment isn't tied to this (it's a separate #define in ldcw.h) if you want to benchmark it. James