Re: [PATCH v8] refs.c: SSE2 optimizations for check_refname_component

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



David Turner <dturner@xxxxxxxxxxxxxxxx> writes:

> Optimize check_refname_component using SSE2 on x86_64.
>
> git rev-parse HEAD is a good test-case for this, since it does almost
> nothing except parse refs.  For one particular repo with about 60k
> refs, almost all packed, the timings are:
>
> Look up table: 29 ms
> SSE2:          23 ms
>
> This cuts about 20% off of the runtime.
>
> The configure.ac changes include code from the GNU C Library written
> by Joseph S. Myers <joseph at codesourcery dot com>.
>
> Ondřej Bílka <neleai@xxxxxxxxx> suggested an SSE2 approach to the

One e-mail address is obfuscated while the other not; intended?

> substring searches, which netted a speed boost over the SSE4.2 code I
> had initially written.
>
> Signed-off-by: David Turner <dturner@xxxxxxxxxxx>
> ---
> diff --git a/git-compat-util.h b/git-compat-util.h
> index f6d3a46..291d46b 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -668,6 +668,16 @@ void git_qsort(void *base, size_t nmemb, size_t size,
>  #endif
>  #endif
>  
> +#if defined(__GNUC__) && defined(__x86_64__)
> +#include <emmintrin.h>
> +/* This is the system memory page size; it's used so that we can read

Style (there are other instances of the same kind).

/*
 * This is the ...

> + * outside the bounds of an allocation without segfaulting.
> + */

> +static int check_refname_component_trailer(const char *cp, const char *refname, int flags)
> +{
> +	if (cp == refname)
> +		return 0; /* Component has zero length. */
> +	if (refname[0] == '.') {
> +		if (!(flags & REFNAME_DOT_COMPONENT))
> +			return -1; /* Component starts with '.'. */
> +		/*
> +		 * Even if leading dots are allowed, don't allow "."
> +		 * as a component (".." is prevented by a rule above).
> +		 */
> +		if (refname[1] == '\0')
> +			return -1; /* Component equals ".". */
> +	}
> +	if (cp - refname >= 5 && !memcmp(cp - 5, ".lock", 5))
> +		return -1; /* Refname ends with ".lock". */

This is merely a moved code that retained the same comment, but it
is more like "the current refname component ends with .lock", I
suspect.  In other words, we do not allow "refs/heads/foo.lock/bar".
Am I reading the patch correctly?

> +#if defined(__GNUC__) && defined(__x86_64__)
> +#define SSE_VECTOR_BYTES 16
> +
> +/* Vectorized version of check_refname_format. */
> +int check_refname_format(const char *refname, int flags)
> +{
> +	const char *cp = refname;
> +
> +	const __m128i dot = _mm_set1_epi8 ('.');

Style (there are other instances of the same kind).  No SP between
function/macro name and opening parenthesis.

> +	if (refname[0] == '.') {
> +		if (refname[1] == '/' || refname[1] == '\0')
> +			return -1;
> +		if (!(flags & REFNAME_DOT_COMPONENT))
> +			return -1;
> +	}
> +	while(1) {
> +		__m128i tmp, tmp1, result;
> +		uint64_t mask;
> +
> +		if ((uintptr_t) cp % PAGE_SIZE > PAGE_SIZE - SSE_VECTOR_BYTES  - 1)

OK, so we make sure we do not overrun by reading too much near the
end of the page, as the next page might be unmapped.

I am showing my ignorance but does cp (i.e. refname) upon entry to
this function need to be aligned in some way?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]