On Tue, Dec 05, 2023 at 03:51:30PM -0400, Jason Gunthorpe wrote: > On Tue, Dec 05, 2023 at 07:34:45PM +0000, Catalin Marinas wrote: > > > 2) You want to #define __iowrite512_copy() to memcpy_toio() on ARM and > > > implement some quad STP optimization for this case? > > > > We can have the generic __iowrite512_copy() do memcpy_toio() and have > > the arm64 implement an optimised version. > > > > What I'm not entirely sure of is the DGH (whatever the io_* barrier name > > is). I'd put it in the same __iowrite512_copy() function and remove it > > from the driver code. Otherwise when ST64B is added, we have an > > unnecessary DGH in the driver. If this does not match the other > > __iowrite*_copy() semantics, we can come up with another name. But start > > with this for now and document the function. > > I think the iowrite is only used for WC and the DGH is functionally > harmless for non-WC, so it makes sense. > > In this case we should just remove the DGH macro from the generic > architecture code and tell people to use iowrite - since we now > understand that callers basically have to in order to use DGH on new > ARM CPUs. That works for me but what would the semantics be for __iowrite64_copy() for example? Is there a DGH at the end of the whole write or after each iteration? I'd go with the former since e.g. hns3_tx_push_bd() does that (and doesn't seem to be a 64 byte copy). Similarly for __iowrite512_copy(), if you want the DGH after each iteration you should only pass a count of 1. -- Catalin