On Thu, 2020-09-10 at 14:10 -0300, Jason Gunthorpe wrote: > Can you explain what this actually does on ARM? > > Can it ever speculate loads across page boundaries, or speculate > loads > that never exist in the program? ie will we get random unpredicable > MemRds? Probably, at least on powerpc you will as well, that's the only way to get write combine. > Does it/could it "combine writes"? I assume so for ARM, definitely for powerpc. > > > If the CPU fails to generate a 64 byte TLP then the device will > > > still > > > operate correctly but does a different, slower, flow. > > > > Side note: on ARM that TLP is not a native interconnect > > transaction, > > reworded, it depends on what the system-bus->PCI logic does in > > this respect. > > I think the issue is that ARM never defined what the bits set by > pgprot_writecombine() do at a system level so we see implementations > that do not cause write combining at the PCI-E interface for those > bits. (I assume from what I've heard) Nobody did. I think only x86 has a real "write combine" attribute. I tried to untangled that mess years ago and didnt' get to the bottom of it, but basically, on non-x86 archs, pgprot_writecombine will give you what you asked ... and more. > > That's why I looped you in - that's what worries me about > > "enabling" > > arch_can_pci_mmap_wc() on arm64. If we enable it and we have perf > > regressions that's not OK. > > > > Or we *can* enable arch_can_pci_mmap_wc() but force the mellanox > > driver (or more broadly all drivers following this message push > > semantics) to use "something else" for WC detection. > > arch_can_pci_mmap_wc() really only controls the sysfs resource file > and it seems very unclear who in userspace uses that these days. dpdk under some circumstances afaik. > vfio is now the right way to do that stuff. I don't see an obvious > way to get WC memory in VFIO though... Which would be a performance issue on a number of things I suppose... Cheers, Ben.