On 05/28/2014 11:04 PM, David Turner wrote: > In a repository with tens of thousands of refs, the command > ~/git/git-diff-index --cached --quiet --ignore-submodules [revision] > is a bit slow. check_refname_component is a major contributor to the > runtime. Provide two optimized versions of this function: one for > machines with SSE4.2, and one for machines without. The speedup for > this command on one particular repo (with about 60k refs) is about 12% > for the SSE version and 8% for the non-SSE version. Interesting. Thanks for the patch. Why do you use "git diff-index" to benchmark your change? Is there something particular about that command that leads to especially bad performance with lots of references? I assume that most of the time spent in check_refname_component() is while reading the packed-refs file, right? If so, there are probably other git commands that would better measure that time, with less other work. For example, "git rev-parse refname" will read the packed-refs file, too (assuming that "refname" is not a loose reference). If you test the speedup of a cheaper command like that, it seems to me that you will get a better idea of how much you are speeding up the ref-reading part. It would also be interesting to see the difference in milliseconds (rather than a percentage) for some specified number of references. I think it's worth using your LUT-based approach to save 8%. I'm less sure it's worth the complication and unreadability of using SSE to get the next 4% speedup. But the gains might be proven to be more significant if you benchmark them differently. I won't critique the code in detail until we have thrashed out the metaissues, but I made a few comments below about nits that I noticed when I scanned through. > Signed-off-by: David Turner <dturner@xxxxxxxxxxx> > --- > Makefile | 6 +++ > configure.ac | 6 +++ > refs.c | 152 +++++++++++++++++++++++++++++++++++++++++++++++++++-- > t/t5511-refspec.sh | 13 +++++ > 4 files changed, 172 insertions(+), 5 deletions(-) > > diff --git a/Makefile b/Makefile > index a53f3a8..123e2fc 100644 > --- a/Makefile > +++ b/Makefile > @@ -1326,6 +1326,11 @@ else > COMPAT_OBJS += compat/win32mmap.o > endif > endif > +ifdef NO_SSE > + BASIC_CFLAGS += -DNO_SSE > +else > + BASIC_CFLAGS += -msse4 > +endif > ifdef OBJECT_CREATION_USES_RENAMES > COMPAT_CFLAGS += -DOBJECT_CREATION_MODE=1 > endif > @@ -2199,6 +2204,7 @@ GIT-BUILD-OPTIONS: FORCE > @echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@ > @echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@ > @echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@ > + @echo NO_SSE=\''$(subst ','\'',$(subst ','\'',$(NO_SSE)))'\' >>$@ > ifdef TEST_OUTPUT_DIRECTORY > @echo TEST_OUTPUT_DIRECTORY=\''$(subst ','\'',$(subst ','\'',$(TEST_OUTPUT_DIRECTORY)))'\' >>$@ > endif > diff --git a/configure.ac b/configure.ac > index b711254..71a9429 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -382,6 +382,11 @@ AS_HELP_STRING([],[Tcl/Tk interpreter will be found in a system.]), > GIT_PARSE_WITH(tcltk)) > # > > +# Declare the with-sse/without-sse options. > +AC_ARG_WITH(sse, > +AS_HELP_STRING([--with-sse],[use SSE instructions (default is YES)]), > +GIT_PARSE_WITH(sse)) > + > > ## Checks for programs. > AC_MSG_NOTICE([CHECKS for programs]) > @@ -449,6 +454,7 @@ else > fi > fi > GIT_CONF_SUBST([TCLTK_PATH]) > +GIT_CONF_SUBST([NO_SSE]) > AC_CHECK_PROGS(ASCIIDOC, [asciidoc]) > if test -n "$ASCIIDOC"; then > AC_MSG_CHECKING([for asciidoc version]) > diff --git a/refs.c b/refs.c > index 28d5eca..8f0de04 100644 > --- a/refs.c > +++ b/refs.c > @@ -5,6 +5,8 @@ > #include "dir.h" > #include "string-list.h" > > +#include <nmmintrin.h> > + You include this file unconditionally, but I doubt that it is present on all platforms. I guess it has to be protected with #ifdef or something at a minimum. > /* > * Make sure "ref" is something reasonable to have under ".git/refs/"; > * We do not like it if: > @@ -29,30 +31,169 @@ static inline int bad_ref_char(int ch) > return 0; > } > Please add a comment here about what the values in refname_disposition signify. And maybe add some line comments explaining unusual values, for people who haven't memorized the ASCII encoding; e.g., on the third line, /* SP -> 0, '.' -> 2, '-' -> 1 */ Also, this variable could be static. And if you change your encoding to use 0 instead of 9 for valid characters, then you could define the table with an explicit length of 256 and omit the second half of the initializers, letting those values be initialized to zero automatically. > +char refname_disposition[] = { > + 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > + 0, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 9, 9, 9, 2, 1, > + 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 9, 9, 9, 9, 0, > + 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, > + 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 0, 9, 0, 9, > + 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, > + 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 3, 9, 9, 0, 0, > + 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, > + 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, > + 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, > + 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, > + 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, > + 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, > + 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, > + 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9 > +}; > + > /* > * Try to read one refname component from the front of refname. Return > * the length of the component found, or -1 if the component is not > * legal. > */ > -static int check_refname_component(const char *refname, int flags) > +static int check_refname_component_1(const char *refname, int flags) > { > const char *cp; > char last = '\0'; > > for (cp = refname; ; cp++) { > - char ch = *cp; > - if (ch == '\0' || ch == '/') > + unsigned char ch = (unsigned char) *cp; > + char disp = refname_disposition[ch]; > + switch(disp) { > + case 0: > + return -1; > + case 1: > + goto out; > + case 2: > + if (last == '.') > + return -1; > break; > - if (bad_ref_char(ch)) > - return -1; /* Illegal character in refname. */ > + case 3: > + if (last == '@') > + return -1; > + break; > + } You seem to have some indentation problems. Please always indent with TAB characters. > + > if (last == '.' && ch == '.') > return -1; /* Refname contains "..". */ > if (last == '@' && ch == '{') > return -1; /* Refname contains "@{". */ > last = ch; > } > +out: > + if (cp == refname) > + return 0; /* Component has zero length. */ > + > + if (refname[0] == '.') { > + if (!(flags & REFNAME_DOT_COMPONENT)) > + return -1; /* Component starts with '.'. */ > + /* > + * Even if leading dots are allowed, don't allow "." > + * as a component (".." is prevented by a rule above). > + */ > + if (refname[1] == '\0') > + return -1; /* Component equals ".". */ > + } > + if (cp - refname >= 5 && !memcmp(cp - 5, ".lock", 5)) > + return -1; /* Refname ends with ".lock". */ > + return cp - refname; > +} > + > +#ifdef NO_SSE > +#define check_refname_component check_refname_component_1 > +#else > + > +#ifndef _SIDD_UBYTE_OPS > +#define _SIDD_UBYTE_OPS 0x00 > +#define _SIDD_CMP_EQUAL_ANY 0x00 > +#define _SIDD_CMP_RANGES 0x04 > +#define _SIDD_CMP_EQUAL_ORDERED 0x0c > +#define _SIDD_NEGATIVE_POLARITY 0x10 > +#endif > +#ifndef PAGE_SIZE > +#define PAGE_SIZE 4096 > +#endif > +#define BLOCK_SIZE 16 > + > +/* Vectorized version of check_refname_component */ > +static int check_refname_component(const char *refname, int flags) > +{ > + const __m128i *refname_vec = (__m128i*) refname; > + > + /* Character ranges for characters forbidden in refs; see > + * above */ > + static const __v16qi bad = { > + 0x01, 0x20, 0x7e, 0x7f, 0x5e, 0x5e, 0x3a, 0x3a, > + 0x5b, 0x5c, 0x2a, 0x2a, 0x3f, 0x3f, 0x3f, 0x3f}; > + > + static const __v16qi nonslashes = { > + '\001', '/' -1, '/' + 1, 0xff, > + }; > + > + static const __v16qi dotdot = {'.','.',0}; > + static const __v16qi atcurly = {'@','{',0}; > + > + const __m128i *vp; > + const char *cp = (const char *)refname_vec; > + > + > + int dotdotpos = BLOCK_SIZE, atcurlypos = BLOCK_SIZE; > + for (vp = refname_vec; ; vp++) { > + __m128i tmp; > + int endpos; > + > + /* Handle case of forbidden substrings .. and @{ crossing > + * sixteen-byte boudaries */ > + if (dotdotpos == 15 && *cp == '.') > + return -1; > + > + if (atcurlypos == 15 && *cp == '{') > + return -1; > + > + if (((uintptr_t) vp & (PAGE_SIZE - 1)) > PAGE_SIZE - BLOCK_SIZE) > + /* End-of-page; fall back to slow method for > + * this entire component. */ > + return check_refname_component_1(refname, flags); > + > + tmp = _mm_lddqu_si128(vp); > + > + /* Find slashes or end-of-string. The double-negative > + * (negative-polarity search for non-slashes) is > + * necessary so that \0 will also be counted. */ > + endpos = _mm_cmpistri((__m128i) nonslashes, tmp, > + _SIDD_UBYTE_OPS | _SIDD_CMP_RANGES | > + _SIDD_NEGATIVE_POLARITY); > + > + if (_mm_cmpestrc((__m128i) bad, BLOCK_SIZE, tmp, endpos, > + _SIDD_UBYTE_OPS | _SIDD_CMP_RANGES)) > + return -1; > + > + dotdotpos = _mm_cmpestri((__m128i) dotdot, 2, tmp, endpos, > + _SIDD_UBYTE_OPS | > + _SIDD_CMP_EQUAL_ORDERED); > + if (dotdotpos < 15) > + return -1; > + > + atcurlypos = _mm_cmpestri((__m128i) atcurly, 2, tmp, endpos, > + _SIDD_UBYTE_OPS | > + _SIDD_CMP_EQUAL_ORDERED); > + if (atcurlypos < 15) > + return -1; > + > + if (endpos < BLOCK_SIZE) { > + cp = ((const char*) vp) + endpos; > + break; > + } > + cp = (const char*) vp + BLOCK_SIZE; > + } > + > if (cp == refname) > return 0; /* Component has zero length. */ > + > if (refname[0] == '.') { > if (!(flags & REFNAME_DOT_COMPONENT)) > return -1; /* Component starts with '.'. */ > @@ -67,6 +208,7 @@ static int check_refname_component(const char *refname, int flags) > return -1; /* Refname ends with ".lock". */ > return cp - refname; > } > +#endif > > int check_refname_format(const char *refname, int flags) > { > diff --git a/t/t5511-refspec.sh b/t/t5511-refspec.sh > index c289322..0f03f9c 100755 > --- a/t/t5511-refspec.sh > +++ b/t/t5511-refspec.sh > @@ -84,4 +84,17 @@ test_refspec push 'refs/heads/*/*/for-linus:refs/remotes/mine/*' invalid > test_refspec fetch 'refs/heads/*/for-linus:refs/remotes/mine/*' > test_refspec push 'refs/heads/*/for-linus:refs/remotes/mine/*' > > +test_refspec fetch 'refs/heads/a-very-long-refname' > +test_refspec fetch 'refs/heads/.a-very-long-refname' invalid > +test_refspec fetch 'refs/heads/abcdefgh0123..' invalid > +test_refspec fetch 'refs/heads/abcdefgh01234..' invalid > +test_refspec fetch 'refs/heads/abcdefgh012345..' invalid > +test_refspec fetch 'refs/heads/abcdefgh0123456..' invalid > +test_refspec fetch 'refs/heads/abcdefgh01234567..' invalid > +test_refspec fetch 'refs/heads/abcdefgh0123.a' > +test_refspec fetch 'refs/heads/abcdefgh01234.a' > +test_refspec fetch 'refs/heads/abcdefgh012345.a' > +test_refspec fetch 'refs/heads/abcdefgh0123456.a' > +test_refspec fetch 'refs/heads/abcdefgh01234567.a' > + > test_done > Please mention in your commit message why you added these tests. (Are they to test peculiarities around 16-byte boundaries?) Michael -- Michael Haggerty mhagger@xxxxxxxxxxxx http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html