On Wed, Sep 16, 2020 at 09:48:52AM +0100, Catalin Marinas wrote: > On Wed, Sep 16, 2020 at 09:33:16AM +0100, Will Deacon wrote: > > On Tue, Sep 15, 2020 at 08:40:06PM -0300, Jason Gunthorpe wrote: > > > On Wed, Sep 16, 2020 at 09:17:38AM +1000, Benjamin Herrenschmidt wrote: > > > > On Tue, 2020-09-15 at 08:05 -0300, Jason Gunthorpe wrote: > > > > > > To sum it up: > > > > > > > > > > > > (1) RDMA drivers need a new mapping function/attribute to define their > > > > > > message push model. Actually the message model is not necessarily related > > > > > > to write combining a la x86, so we should probably come up with a better > > > > > > and consistent naming. Enabling this patchset may trigger performance > > > > > > regressions on mellanox drivers on arm64 - this ought to be > > > > > > addressed. > > > > > > > > > > It is pretty clear now that the certain ARM chips that don't do write > > > > > combining with pgprot_writecombine will performance regress if they > > > > > are running a certain uncommon Mellanox configuration. I suspect these > > > > > deployments are all running the out of tree patch for DEVICE_GRE > > > > > though. > > > > > > > > I'm not sure I understand... > > > > > > > > Today those ARM chips will not use pgprot_writecombine (at least not > > > > using that code path, they might still use it as the result of the > > > > other path in the driver that can enable it). > > > > > > Not quite, upstream kernel will never use WC on those > > > devices. DEVICE_GRE is not supported in upstream, > > > arch_can_pci_mmap_wc() is always false and the WC tester will always > > > fail. > > > > > > > With the patch, those device will now use MT_DEVICE_NC. > > > > > > Which doesn't do WC at all on some ARM implementations. > > > > Is that just TX2? I remember that thing being weird where GRE performed > > better than NC, but I thought that was a one off (and the thing is dead). > > I recall something along these lines. Hopefully ARM updated the guidance > to licensees. > > > NC is more permissive than GRE, so I think that's the right one to use; i.e. > > we go for the fewest number of restrictions on the hardware. If somebody > > screws up the uarch, that's up to them. > > I agree, Normal NC is better as long as the BAR can tolerate read > side-effects. That we don't know but if a prefetchable BAR can't tolerate read side effects this would be already a problem on eg x86 - that's the best we can hope for given the current PCI specs. +1 on normal NC. The only open point is whether we should make arch_can_pci_mmap_wc() return false on platforms like TX2. Thanks, Lorenzo