Re: [PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 25, 2024 at 01:43:33PM -0400, Jason Gunthorpe wrote:
> On Wed, Jan 24, 2024 at 03:26:34PM -0400, Jason Gunthorpe wrote:
> > The suggestion that it should not have any interleaving instructions
> > and use STP came from our CPU architecture team.
> 
> I got some more details here.
> 
> They point to the ARM publication about write combining
> 
> https://community.arm.com/cfs-file/__key/telligent-evolution-components-attachments/13-150-00-00-00-00-10-12/Understanding_5F00_Write_5F00_Combining_5F00_on_5F00_Arm_5F00_V.1.0.pdf
> 
> specifically to the example code using 4x 128 bit NEON stores.

That's an example but this document doesn't make any statements about
64-bit writes.

> They point at the actual CPU design and say it is optimized for 128
> bit stores (STP and ST4 included, it seems).
> 
> 64 bit stores trigger some different behavior.

This is highly microarchitecture specific. The best bet in the future is
the ST64B instruction but in the meantime it's pretty much guessing.

> I have no way to know if it will be OK for other drivers that expect
> this to be a performance path in the kernel.
> 
> Are you *sure* you want to do this str version? If it works for mlx5 I
> will send the patch and the other companies can come later with
> performance data.

Yeah, I'd stick to the STR for now, it makes things simpler as we don't
have to care about what emulation does.

-- 
Catalin




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux