On 04/04/2011 12:29 PM, Erik Faye-Lund wrote: > > Wouldn't this part be cleaner as a constant-length loop? Any > optimizing compiler should end up unrolling this, and we don't get as > much code-duplication... > It would. > Oddly enough, this change gave me a drastic (> 20%) performance > increase in my test (isolating get_mode in a separate compilation > unit, and calling it in a loop): > > diff --git a/tree-walk.c b/tree-walk.c > index b8d504b..114ad63 100644 > --- a/tree-walk.c > +++ b/tree-walk.c > @@ -7,42 +7,30 @@ > static const char *get_mode(const char *str, unsigned int *modep) > { > unsigned char c; > - unsigned int mode = 0; > + unsigned int mode = 0, i; > > /* > - * Unroll what looks like a loop since the bounds are > + * Allow the compiler to unroll the loop since the bounds are > * well-known. There should be at least 5 and at most 6 > * characters available in any valid mode, as '40000' is the > * shortest while '160000' (S_IFGITLINK) is the longest. > */ > - /* char 1 */ > - c = *str++; > - if (c< '0' || c> '7') return NULL; > - mode = (mode<< 3) + (c - '0'); > - /* char 2 */ > - c = *str++; > - if (c< '0' || c> '7') return NULL; > - mode = (mode<< 3) + (c - '0'); > - /* char 3 */ > - c = *str++; > - if (c< '0' || c> '7') return NULL; > - mode = (mode<< 3) + (c - '0'); > - /* char 4 */ > - c = *str++; > - if (c< '0' || c> '7') return NULL; > - mode = (mode<< 3) + (c - '0'); > - /* char 5 */ > - c = *str++; > - if (c< '0' || c> '7') return NULL; > - mode = (mode<< 3) + (c - '0'); > + for (i = 0; i< 5; ++i) { s/i< 5/i < 5/ (nitpicking, yes) > + c = *str++; > + if (c< '0' || c> '7') > + return NULL; > + mode = (mode<< 3) + (c - '0'); This could be (micro-)optimized further as: --%<--%<-- for (i = 0; i < 5; i++) { int c = *str++ - '0'; if (c & ~7) { /* 5 chars is ok, so long as *str is now a space */ if (i == 5 && c == ' ') { *modep = mode; return str; } return NULL; } mode = (mode << 3) + c; } /* this should be a space, or we've got a malformed entry */ if (*str != ' ') return NULL; return str + 1; --%<--%<-- This should be slightly faster since there's now only one comparison inside the loop and since one branch contains a second branch fork-point and an inevitable early return the branch prediction machinery in gcc will favor the most common case properly, but pre-computations on 'mode' have to be undone in one case of hitting the early return, it might be faster to play pickup after a loop to 5 is done. -- Andreas Ericsson andreas.ericsson@xxxxxx OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html