On Fri, Nov 24, 2023 at 03:10:29PM +0100, Niklas Schnelle wrote: > What's the reasoning behind not using the existing memcpy_toio() > here? Going forward CPUs are implementing an instruction to do a 64 byte aligned store, this is a wrapper for exactly that operation. memcpy_toio() is much more general, it allows unaligned buffers and non-multiples of 64. Adapting the general version to generate the optimized version in the cases it can is complex and has a codegen penalty.. > For s390 the above generic variant would do 8 of our special PCI store > instructions while memcpy_toio() is defined to zpci_memcpy_toio() which > can do the same as a single PCI store block instruction. Now of course > we could provide our own memcpy_toio_64() but that would end up the > same as just doing memcpy_toio(addr, buffer, 64) here. This is probably better? Jason