On Wed, 5 Mar 2025 11:16:08 -0800 Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > On Wed, Mar 05, 2025 at 02:26:53PM +0000, David Laight wrote: > > On Tue, 4 Mar 2025 13:32:16 -0800 > > Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > > > > > From: Eric Biggers <ebiggers@xxxxxxxxxx> > > > > > > For handling the 0 <= len < sizeof(unsigned long) bytes left at the end, > > > do a 4-2-1 step-down instead of a byte-at-a-time loop. This allows > > > taking advantage of wider CRC instructions. Note that crc32c-3way.S > > > already uses this same optimization too. > > > > An alternative is to add extra zero bytes at the start of the buffer. > > They don't affect the crc and just need the first 8 bytes shifted left. > > > > I think any non-zero 'crc-in' just needs to be xor'ed over the first > > 4 actual data bytes. > > (It's over 40 years since I did the maths of CRC.) ... > > David > > Sure, but that only works when len >= sizeof(unsigned long). Also, the initial > CRC sometimes has to be divided between two unsigned longs. Yes, I was thinking that might make it a bit more tricky. I need to find some spare time :-) I wasn't taught anything about using non-carry multiplies either. And I can't remember the relevant 'number field' stuff either. But (with no-carry maths) I think you have: crc(n + 1) = (crc(n) + data(n)) * poly If data(n+1) and data(n+2) are zero (handled elsewhere) you have: crc(n + 3) = (((crc(n) + data(n)) * poly) * poly) * poly I think that because it is a field this is the same as crc(n + 3) = (crc(n) + data(n)) * (poly * poly * poly) which is just a different crc polynomial. If true your '3-way' cpu doesn't have to use big blocks. I guess I'm missing something. David