Re: [PATCH v4 1/2] refs.c: optimize check_refname_component()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for splitting this up into two patches.  See my comments below.

On 06/01/2014 07:17 AM, David Turner wrote:
> In a repository with many refs, check_refname_component can be a major
> contributor to the runtime of some git commands. One such command is
> 
> git rev-parse HEAD
> 
> Timings for one particular repo, with about 60k refs, almost all
> packed, are:
> 
> Old: 35 ms
> New: 29 ms
> 
> Many other commands which read refs are also sped up.
> 
> Signed-off-by: David Turner <dturner@xxxxxxxxxxx>
> ---
>  refs.c | 68 ++++++++++++++++++++++++++++++++++++++++--------------------------
>  1 file changed, 41 insertions(+), 27 deletions(-)
> 
> diff --git a/refs.c b/refs.c
> index 28d5eca..62e2301 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -5,9 +5,32 @@
>  #include "dir.h"
>  #include "string-list.h"
>  
> +/* How to handle various characters in refnames:

We format multiline comments with no text on the opening line:

/*
 * How to handle...
 * ... is derived.
 */

> + * 0: An acceptable character for refs
> + * 1: End-of-component
> + * 2: ., look for a following . to reject .. in refs
> + * 3: @, look for a following { to reject @{ in refs
> + * 9: A bad character, reject ref

This explanation does not agree with the code (or I'm not reading it
correctly).  For example, the character with disposition 3 is '{', not
'@', so ISTM the comment should be

     * 3: {, look for a preceding @ to reject @{ in refnames

BTW, is there a special reason that the values in your table jump from 3
to 9?  I imagine that compilers would use a jump table to implement the
"switch" statement where these values are used, and they might have a
slightly easier time if there are no "holes" between the legal values.

> + *
> + * See below for the list of illegal characters, from which
> + * this table is derived.
> + */
> +static unsigned char refname_disposition[] = {

I think you need to define the length explicitly to 256 here to cause
the entries for 0x80..0xff to be created and initialized to zeros.

The fact that you didn't notice this bug suggests that there might be no
tests involving refnames with non-ASCII characters, which would be
another nice thing to remedy while you are working in this area.

> +	1, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
> +	9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
> +	9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 2, 1,
> +	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 9,
> +	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> +	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 9, 9, 0, 9, 0,
> +	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> +	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 9, 9
> +};
> +
>  /*
> - * Make sure "ref" is something reasonable to have under ".git/refs/";
> - * We do not like it if:
> + * Try to read one refname component from the front of refname.
> + * Return the length of the component found, or -1 if the component is
> + * not legal.  It is legal if it is something reasonable to have under
> + * ".git/refs/"; We do not like it if:
>   *
>   * - any path component of it begins with ".", or
>   * - it has double dots "..", or
> @@ -15,24 +38,7 @@
>   * - it ends with a "/".
>   * - it ends with ".lock"
>   * - it contains a "\" (backslash)
> - */
>  

^^^ doesn't this leave a blank line inside the comment?

> -/* Return true iff ch is not allowed in reference names. */
> -static inline int bad_ref_char(int ch)
> -{
> -	if (((unsigned) ch) <= ' ' || ch == 0x7f ||
> -	    ch == '~' || ch == '^' || ch == ':' || ch == '\\')
> -		return 1;
> -	/* 2.13 Pattern Matching Notation */
> -	if (ch == '*' || ch == '?' || ch == '[') /* Unsupported */
> -		return 1;
> -	return 0;
> -}
> -
> -/*
> - * Try to read one refname component from the front of refname.  Return
> - * the length of the component found, or -1 if the component is not
> - * legal.
>   */
> [...]

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]