David Turner <dturner@xxxxxxxxxxxxxxxx> writes: > Optimize check_refname_component using SSE2 on x86_64. > > git rev-parse HEAD is a good test-case for this, since it does almost > nothing except parse refs. For one particular repo with about 60k > refs, almost all packed, the timings are: > > Look up table: 29 ms > SSE2: 23 ms > > This cuts about 20% off of the runtime. > > The configure.ac changes include code from the GNU C Library written > by Joseph S. Myers <joseph at codesourcery dot com>. > > Ondřej Bílka <neleai@xxxxxxxxx> suggested an SSE2 approach to the One e-mail address is obfuscated while the other not; intended? > substring searches, which netted a speed boost over the SSE4.2 code I > had initially written. > > Signed-off-by: David Turner <dturner@xxxxxxxxxxx> > --- > diff --git a/git-compat-util.h b/git-compat-util.h > index f6d3a46..291d46b 100644 > --- a/git-compat-util.h > +++ b/git-compat-util.h > @@ -668,6 +668,16 @@ void git_qsort(void *base, size_t nmemb, size_t size, > #endif > #endif > > +#if defined(__GNUC__) && defined(__x86_64__) > +#include <emmintrin.h> > +/* This is the system memory page size; it's used so that we can read Style (there are other instances of the same kind). /* * This is the ... > + * outside the bounds of an allocation without segfaulting. > + */ > +static int check_refname_component_trailer(const char *cp, const char *refname, int flags) > +{ > + if (cp == refname) > + return 0; /* Component has zero length. */ > + if (refname[0] == '.') { > + if (!(flags & REFNAME_DOT_COMPONENT)) > + return -1; /* Component starts with '.'. */ > + /* > + * Even if leading dots are allowed, don't allow "." > + * as a component (".." is prevented by a rule above). > + */ > + if (refname[1] == '\0') > + return -1; /* Component equals ".". */ > + } > + if (cp - refname >= 5 && !memcmp(cp - 5, ".lock", 5)) > + return -1; /* Refname ends with ".lock". */ This is merely a moved code that retained the same comment, but it is more like "the current refname component ends with .lock", I suspect. In other words, we do not allow "refs/heads/foo.lock/bar". Am I reading the patch correctly? > +#if defined(__GNUC__) && defined(__x86_64__) > +#define SSE_VECTOR_BYTES 16 > + > +/* Vectorized version of check_refname_format. */ > +int check_refname_format(const char *refname, int flags) > +{ > + const char *cp = refname; > + > + const __m128i dot = _mm_set1_epi8 ('.'); Style (there are other instances of the same kind). No SP between function/macro name and opening parenthesis. > + if (refname[0] == '.') { > + if (refname[1] == '/' || refname[1] == '\0') > + return -1; > + if (!(flags & REFNAME_DOT_COMPONENT)) > + return -1; > + } > + while(1) { > + __m128i tmp, tmp1, result; > + uint64_t mask; > + > + if ((uintptr_t) cp % PAGE_SIZE > PAGE_SIZE - SSE_VECTOR_BYTES - 1) OK, so we make sure we do not overrun by reading too much near the end of the page, as the next page might be unmapped. I am showing my ignorance but does cp (i.e. refname) upon entry to this function need to be aligned in some way? Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html