Re: [PATCH] check_refname_component: Optimize

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



David Turner <dturner@xxxxxxxxxxxxxxxx> writes:

> In a repository with tens of thousands of refs, the command
> ~/git/git-diff-index --cached --quiet --ignore-submodules [revision]
> is a bit slow.  check_refname_component is a major contributor to the
> runtime.  Provide two optimized versions of this function: one for
> machines with SSE4.2, and one for machines without.  The speedup for
> this command on one particular repo (with about 60k refs) is about 12%
> for the SSE version and 8% for the non-SSE version.
>
> Signed-off-by: David Turner <dturner@xxxxxxxxxxx>

Just a few quick impressions (I do not have time today to look at
new patches).

 - The title seems a bit strange.
   Perhaps "refs.c: optimize check_refname_component()" or something?

 - "~/git/git-diff-index" looks doubly strange in that the issue you
   are addressing would not depend on your compiled Git being
   installed in your $HOME/git at all.  For that matter, from the
   command line and the patch, I am not sure if this is specific to
   "git diff-index", or the same issue exists for anything that
   takes revs and pathspecs from the command line.





> ---
>  Makefile           |   6 +++
>  configure.ac       |   6 +++
>  refs.c             | 143 +++++++++++++++++++++++++++++++++++++++++++++++++++--
>  t/t5511-refspec.sh |  13 +++++
>  4 files changed, 163 insertions(+), 5 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index a53f3a8..123e2fc 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1326,6 +1326,11 @@ else
>  		COMPAT_OBJS += compat/win32mmap.o
>  	endif
>  endif
> +ifdef NO_SSE
> +	BASIC_CFLAGS += -DNO_SSE
> +else
> +	BASIC_CFLAGS += -msse4
> +endif
>  ifdef OBJECT_CREATION_USES_RENAMES
>  	COMPAT_CFLAGS += -DOBJECT_CREATION_MODE=1
>  endif
> @@ -2199,6 +2204,7 @@ GIT-BUILD-OPTIONS: FORCE
>  	@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@
>  	@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@
>  	@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@
> +	@echo NO_SSE=\''$(subst ','\'',$(subst ','\'',$(NO_SSE)))'\' >>$@
>  ifdef TEST_OUTPUT_DIRECTORY
>  	@echo TEST_OUTPUT_DIRECTORY=\''$(subst ','\'',$(subst ','\'',$(TEST_OUTPUT_DIRECTORY)))'\' >>$@
>  endif
> diff --git a/configure.ac b/configure.ac
> index b711254..71a9429 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -382,6 +382,11 @@ AS_HELP_STRING([],[Tcl/Tk interpreter will be found in a system.]),
>  GIT_PARSE_WITH(tcltk))
>  #
>  
> +# Declare the with-sse/without-sse options.
> +AC_ARG_WITH(sse,
> +AS_HELP_STRING([--with-sse],[use SSE instructions (default is YES)]),
> +GIT_PARSE_WITH(sse))
> +
>  
>  ## Checks for programs.
>  AC_MSG_NOTICE([CHECKS for programs])
> @@ -449,6 +454,7 @@ else
>    fi
>  fi
>  GIT_CONF_SUBST([TCLTK_PATH])
> +GIT_CONF_SUBST([NO_SSE])
>  AC_CHECK_PROGS(ASCIIDOC, [asciidoc])
>  if test -n "$ASCIIDOC"; then
>  	AC_MSG_CHECKING([for asciidoc version])
> diff --git a/refs.c b/refs.c
> index 28d5eca..8ca124c 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -5,6 +5,8 @@
>  #include "dir.h"
>  #include "string-list.h"
>  
> +#include <nmmintrin.h>
> +

We would prefer not to add inclusion of any system header files in
random *.c files, as there often are system dependencies (order of
inclusion, definition of feature macros, etc.) we would rather want
to encapsulate in one place, that is git-compat-util.h.

>  /*
>   * Make sure "ref" is something reasonable to have under ".git/refs/";
>   * We do not like it if:
> @@ -29,30 +31,160 @@ static inline int bad_ref_char(int ch)
>  	return 0;
>  }
>  
> +char refname_disposition[] = {
> +       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> +       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> ...
> +       9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9
> +};
> +

Does this array need to be extern?

Is this table designed to take both positive and negative values?
If the answer is "I expect we add only positive", then please make
it explicitly "unsigned char".

What do these magic numbers in the array mean?

How were the values derived?  What are you doing in this commit to
help others who later need to debug, fix and enhance this table?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]