Nick Piggin: > The standard memcmp function on a Westmere system shows up hot in > profiles in the `git diff` workload (both parallel and single threaded), > and it is likely due to the costs associated with trapping into > microcode, and little opportunity to improve memory access (dentry > name is not likely to take up more than a cacheline). Let me make sure. What you are pointing out is - asm("repe; cmpsb") may grab CPU long time, and can be a hazard for scaling. - by breaking it into pieces, the chances to scale will increase. Right? Anyway this appraoch replacing smallest code by larger but faster code is interesting. How about mixing 'unsigned char *' and 'unsigned long *' in referencing the given strings? For example, int f(const unsigned char *cs, const unsigned char *ct, size_t count) { int ret; union { const unsigned long *l; const unsigned char *c; } s, t; /* this macro is your dentry_memcmp() actually */ #define cmp(s, t, c, step) \ do { \ while ((c) >= (step)) { \ ret = (*(s) != *(t)); \ if (ret) \ return ret; \ (s)++; \ (t)++; \ (c) -= (step); \ } \ } while (0) s.c = cs; t.c = ct; cmp(s.l, t.l, count, sizeof(*s.l)); cmp(s.c, t.c, count, sizeof(*s.c)); return 0; } What I am thinking here is, - in load and compare, there is no difference between 'char*' and 'long*', probably. - obviously 'step by sizeof(long)' will reduce the number of repeats. - but I am not sure whether the length of string is generally longer than 4 (or 8) or not. J. R. Okajima -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html