On 12/15/2010 08:00 PM, David Miller wrote: > From: Boaz Harrosh <bharrosh@xxxxxxxxxxx> > Date: Wed, 15 Dec 2010 15:15:09 +0200 > >> I agree that the byte-compare or long-compare should give you very close >> results in modern pipeline CPUs. But surly 12 increments-and-test should >> show up against 3 (or even 2). I would say it must be a better plan. > > For strings of these lengths the setup code necessary to initialize > the inner loop and the tail code to handle the sub-word ending cases > eliminate whatever gains there are. > You miss understood me. I'm saying that we know the beggining of the string is aligned and Nick offered to pad the last long, so surly a shift by 2 (or 3) + the reduction of the 12 dec-and-test to 3 should give you an optimization? > I know this as I've been hacking on assembler optimized strcmp() and > memcmp() in my spare time over the past year or so. Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html