On Tue, 2020-09-15 at 08:05 -0300, Jason Gunthorpe wrote: > > To sum it up: > > > > (1) RDMA drivers need a new mapping function/attribute to define their > > message push model. Actually the message model is not necessarily related > > to write combining a la x86, so we should probably come up with a better > > and consistent naming. Enabling this patchset may trigger performance > > regressions on mellanox drivers on arm64 - this ought to be > > addressed. > > It is pretty clear now that the certain ARM chips that don't do write > combining with pgprot_writecombine will performance regress if they > are running a certain uncommon Mellanox configuration. I suspect these > deployments are all running the out of tree patch for DEVICE_GRE > though. I'm not sure I understand... Today those ARM chips will not use pgprot_writecombine (at least not using that code path, they might still use it as the result of the other path in the driver that can enable it). So they get MT_DEVICE_nGnRnE (unless I missed something here). So they will not combine. With the patch, those device will now use MT_DEVICE_NC. Why would that be a regression ? It will allow speculation, that doesn't necessarily mean that the CPU will cause spurrious accesses, it probably won't in most case... And it should allow combining, no ? BTW. Lorenzo, why don't we use MT_DEVICE_GRE for pgprot_writecombine ? Its not supported on some chips ? Not that this lead me to discover annother weird thing ... What on earth is pgprot_device() ? This is new ? On ARM it will be MT_DEVICE_nGnRE, so it allows posted write. It seems to match what ioremap does. Should then ioremap use it as well ? But it's only ever used for PCI mmap. Why is it different from pgprot_noncached() which disables posted writes (nE) ? Because a whole lot of drivers will use pgprot_noncached() explicitly in either mmap or vmap, with the expectation that it's somewhat the same as what ioremap does... Cheers, Ben.