Re: [PATCH] x86/crc32: optimize tail handling for crc32c short inputs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue,  4 Mar 2025 13:32:16 -0800
Eric Biggers <ebiggers@xxxxxxxxxx> wrote:

> From: Eric Biggers <ebiggers@xxxxxxxxxx>
> 
> For handling the 0 <= len < sizeof(unsigned long) bytes left at the end,
> do a 4-2-1 step-down instead of a byte-at-a-time loop.  This allows
> taking advantage of wider CRC instructions.  Note that crc32c-3way.S
> already uses this same optimization too.

An alternative is to add extra zero bytes at the start of the buffer.
They don't affect the crc and just need the first 8 bytes shifted left.

I think any non-zero 'crc-in' just needs to be xor'ed over the first
4 actual data bytes.
(It's over 40 years since I did the maths of CRC.)

You won't notice the misaligned accesses all down the buffer.
When I was testing different ipcsum code misaligned buffers
cost less than 1 clock per cache line.
I think that was even true for the versions that managed 12 bytes
per clock (including the one Linus committed).

	David




[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]
  Powered by Linux