Re: [PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64

Niklas Schnelle <schnelle@xxxxxxxxxxxxx> · Mon, 27 Nov 2023 18:43:11 +0100

On Fri, 2023-11-24 at 12:06 -0400, Jason Gunthorpe wrote:
> On Fri, Nov 24, 2023 at 04:59:38PM +0100, Niklas Schnelle wrote:
>  
> > This should be as easy as adding
> > 
> > #define memcpy_toio_64(to, from) zpci_memcpy_toio(to, from, 64)
> > 
> > to arch/s390/include/asm/io.h. I'm wondering if we should do that as
> > part of this series. It's not as good as a special case but probably
> > better than the existing loop.
> 
> Makes sense

Ok, I overlooked the obvious. Let's make that:

#define memcpy_toio_64(dst, src)       zpci_write_block(dst, src, 64)

> 
> > I don't think we have any existing in-kernel users of memcpy_toio() on
> > s390 so far though so I'd like to give this some extra testing. Could
> > you share instructions on how to exercise the code path of patch 2 on a
> > ConnectX-5 or 6? Is this exercised e.g. when using NVMe-oF RDMA?
> 
> Simply boot and look at pr_debug from mlx5 to see if writecombining is
> on or off - you want to see on.
> 
> Thanks,
> Jason

With the above zpci_write_block(dst, src, 64) we get a PCI store block
without any extra alignment treatment i.e. exactly what we want for
memcpy_toio_64(). If the alignment is wrong the PCI store block
instruction will fail the PCI function will be isolated and we log an
error so I don't see a need for checks there either. On an aside it
looks like our zpci_memcpy_toio() is wrongly looking for tighter than 8
byte alignment on the source address and would issue a series of 8
stores. Still looking into that. I also tested this with our only
privileged (kernel only) PCI stores and that works too.

Also it turns out the writeq() loop we had so far does not produce the
needed 64 byte TLP on s390 either so this actually makes us newly pass
this test.

Thanks,
Niklas