"Darrick J. Wong" <djwong@xxxxxxxxxx> wrote on 2011/10/03 18:00:36: > > On Sat, Oct 01, 2011 at 03:52:00PM +0200, Joakim Tjernlund wrote: > > "Darrick J. Wong" <djwong@xxxxxxxxxx> wrote on 2011/09/30 18:12:23: > > > > > > [putting mailing lists on cc] [SNIP] > > > > > > <shrug> I suppose I could make CRC32C_BITS configurable. What is the hardware > > > profile of your ppc32 processor? How much L1D/L2 cache? slice-by-8 does have > > > a big cache footprint. On the other hand it's faster than the slice-by-4 > > > (crc32) and Sarwate (crc32c) code in the kernel, even on old slow 32-bit x86 > > > processors (PII, PIII, P4). > > > > It is a low end embedded 333 MHz CPU with only L1 cache. How much faster > > is slice by 8 than slice by 4 on these old x86 machines? > > How much L1 cache? Or, if you'd rather not give away specifics, has the CPU > more than 8KB L1 cache? I'm willing to concede that with little cache the > added memory pressure could be painful. > > As for the old x86 machines, please have a look at: > http://djwong.org/docs/ext4_metadata_checksums.html#Benchmarking > > ~15% faster on a 2GHz Via C7 > ~20% faster on a 2.7GHz P4 > ~25% faster on a 500MHz P3 > > I vaguely recall it was ~20% faster on a 400MHz P2, but all the kernel.org > wikis are still down. :( > > So I suspect the key factor here is memory hierachy, since all of those systems > have at least 16K of L1 cache. Slice by 8 might actually suck on a Pentium > Proor earlier. Unfortunately I don't have anything older than a PII... It is 16KB cache on this CPU. I don't know why it was so much slower. Could be a gcc thing as gcc does a fairly lame job at optimizing crc32. Still think making this configurable is a good thing. At least until the verdict is in from other CPUs. Jocke -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html