On Thu, 23 Jan 2025 13:16:03 -0800 Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > On Thu, Jan 23, 2025 at 08:58:10PM +0000, David Laight wrote: ... > > For a small memory footprint it might be worth considering 4 bits at a time. > > So a 16 word (64 byte) lookup table. > > Thinks.... > > You can xor a data byte onto the crc 'accumulator' and then do two separate > > table lookups for each of the high nibbles and xor both onto it before the rotate. > > That is probably a reasonable compromise. > > Yes, you can do less than a byte at a time (currently one of the choices is even > one *bit* at a time!), but I think byte-at-a-time is small enough already. I used '1 bit at a time' for a crc64 of a 5MB file. Actually fast enough during a 'compile' phase (verified by a serial eeprom). But the paired nibble one is something like: crc ^= *data++ << 24; crc ^= table[crc >> 28] ^ table1[(crc >> 24) & 15]; crc = rol(crc, 8); which isn't going to be significantly slower than the byte one where the middle line is: crc ^= table[crc >> 24]; especially for a multi-issue cpu, and the table drops from 1k to 128 bytes. That is quite a lot of D-cache misses. (Since you'll probably get them all twice when the program's working set is reloaded!) Actually you need to rol() the table[]s. Then do: crc = rol(crc, 8) ^ table[] ... to reduce the register dependency chain to 5 per byte. David