On Thu, Sep 17, 2020 at 09:59:28AM +1000, Benjamin Herrenschmidt wrote: > On Wed, 2020-09-16 at 09:12 -0300, Jason Gunthorpe wrote: > > > Also we could make this a variable rather than a constant and > > > choose > > > a more appropriate set of flags at boot time.... > > > > It is a function, so it could check the CPU ID for the known broken > > devices and block them. > > Sure, I meant in the abstract way. It's not a hot path so it doesnt > have to be a static key. > > > > > > Why would that be a regression ? > > > > > > > > Using the WC submission flow when it doesn't work costs something > > > > like > > > > 10% performance vs using the non-WC flow. > > > > > > You mean the driver uses a different path to the HW which ahs that > > > overhead, not that MMIOs have that overhead right ? > > > > The different path has overhead of doing extra useless MMIOs because > > they don't combine > > I see. This might have to end up being a TX2 specific hack until the > end of times... True - hopefully on platforms that implement normal NC the architectural way will not trigger user space performance regressions. Unfortunately if we merge this patch we _do_ know from this thread that userspace will suffer from a perf regression on TX2. Either we ignore it or we write some code to prevent it (ie first step make arch_can_pci_mmap_wc() return 0 on TX2 - possibly using the arm64 errata detection mechanism). Adding a new IO mapping API and use it in IB drivers won't solve the TX2 problem - since we still prefer normal NC over device GRE for "WC" mappings and we would have to "downgrade" TX2 somehow. Lorenzo