On Sun, Dec 19, 2010 at 9:54 AM, George Spelvin <linux@xxxxxxxxxxx> wrote: >> static inline int dentry_memcmp_long(const unsigned char *cs, >> const unsigned char *ct, ssize_t count) >> { >> int ret; >> const unsigned long *ls = (const unsigned long *)cs; >> const unsigned long *lt = (const unsigned long *)ct; >> >> while (count > 8) { >> ret = (*cs != *ct); >> if (ret) >> break; >> cs++; >> ct++; >> count-=8; >> } >> if (count) { >> unsigned long t = *ct & ((0xffffffffffffffff >> ((8 - count) * 8)) >> ret = (*cs != t) >> } >> >> return ret; >> } > > First, let's get the code right, and use correct types, but also, there You still used the wrong vars in the loop. > are some tricks to reduce the masking cost. > > As long as you have to mask one string, *and* don't have to worry about > running off the end of mapped memory, there's no additional cost to > masking both in the loop. Just test (a ^ b) & mask. Using a lookup table I considered, but maybe not well enough. It is another cacheline, but common to all lookups. So it could well be worth it, let's keep your code around... The big problem for CPUs that don't do well on this type of code is what the string goes through during the entire syscall. First, a byte-by-byte strcpy_from_user of the whole name string to kernel space. Then a byte-by-byte chunking and hashing component paths according to '/'. Then a byte-by-byte memcmp against the dentry name. I'd love to do everything with 8 byte loads, do the component separation and hashing at the same time as copy from user, and have the padded and aligned component strings and their hash available... but complexity. On my Westmere system, time to do a stat is 640 cycles plus 10 cycles for every byte in the string (this cost holds perfectly from 1 byte name up to 32 byte names in my test range). `git diff` average path name strings are 31 bytes, although this is much less cache friendly, and over several components (my test is just a single component). But still, even if the base cost were doubled, it may still spend 20% or so kernel cycles in name string handling. This 8 byte memcpy takes my microbenchmark down to 8 cycles per byte, so it may get several more % on git diff. A careful thinking about the initial strcpy_from_user, and hashing code could shave another few cycles off it. Well worth investigating I think. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html