On Fri, 2023-11-24 at 12:06 -0400, Jason Gunthorpe wrote: > On Fri, Nov 24, 2023 at 04:59:38PM +0100, Niklas Schnelle wrote: > > > This should be as easy as adding > > > > #define memcpy_toio_64(to, from) zpci_memcpy_toio(to, from, 64) > > > > to arch/s390/include/asm/io.h. I'm wondering if we should do that as > > part of this series. It's not as good as a special case but probably > > better than the existing loop. > > Makes sense Ok, I overlooked the obvious. Let's make that: #define memcpy_toio_64(dst, src) zpci_write_block(dst, src, 64) > > > I don't think we have any existing in-kernel users of memcpy_toio() on > > s390 so far though so I'd like to give this some extra testing. Could > > you share instructions on how to exercise the code path of patch 2 on a > > ConnectX-5 or 6? Is this exercised e.g. when using NVMe-oF RDMA? > > Simply boot and look at pr_debug from mlx5 to see if writecombining is > on or off - you want to see on. > > Thanks, > Jason With the above zpci_write_block(dst, src, 64) we get a PCI store block without any extra alignment treatment i.e. exactly what we want for memcpy_toio_64(). If the alignment is wrong the PCI store block instruction will fail the PCI function will be isolated and we log an error so I don't see a need for checks there either. On an aside it looks like our zpci_memcpy_toio() is wrongly looking for tighter than 8 byte alignment on the source address and would issue a series of 8 stores. Still looking into that. I also tested this with our only privileged (kernel only) PCI stores and that works too. Also it turns out the writeq() loop we had so far does not produce the needed 64 byte TLP on s390 either so this actually makes us newly pass this test. Thanks, Niklas