Re: [PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 24, 2024 at 03:26:34PM -0400, Jason Gunthorpe wrote:

> The suggestion that it should not have any interleaving instructions
> and use STP came from our CPU architecture team.

I got some more details here.

They point to the ARM publication about write combining

https://community.arm.com/cfs-file/__key/telligent-evolution-components-attachments/13-150-00-00-00-00-10-12/Understanding_5F00_Write_5F00_Combining_5F00_on_5F00_Arm_5F00_V.1.0.pdf

specifically to the example code using 4x 128 bit NEON stores.

They point at the actual CPU design and say it is optimized for 128
bit stores (STP and ST4 included, it seems).

64 bit stores trigger some different behavior.

I have no way to know if it will be OK for other drivers that expect
this to be a performance path in the kernel.

Are you *sure* you want to do this str version? If it works for mlx5 I
will send the patch and the other companies can come later with
performance data.

Jason




[Index of Archives]     [Linux Kernel]     [Kernel Newbies]     [x86 Platform Driver]     [Netdev]     [Linux Wireless]     [Netfilter]     [Bugtraq]     [Linux Filesystems]     [Yosemite Discussion]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]

  Powered by Linux