Re: [PATCH 31/31] gitweb: make hash size independent

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Tue, 12 Feb 2019 11:57:52 +0100

On Tue, Feb 12 2019, brian m. carlson wrote:

> Gitweb has several hard-coded 40 values throughout it to check for
> values that are passed in or acquired from Git.  To simplify the code,
> introduce a regex variable that matches either exactly 40 or exactly 64
> hex characters, and use this variable anywhere we would have previously
> hard-coded a 40 in a regex.
>
> Similarly, switch the code that looks for deleted diffinfo information
> to look for either 40 or 64 zeros, and update one piece of code to use
> this function.  Finally, when formatting a log line, allow an
> abbreviated describe output to contain up to 64 characters.

This might be going a bit overboard but I tried this with a variant
where...

> +# A regex matching a valid object ID.
> +our $oid_regex = qr/(?:[0-9a-fA-F]{40}(?:[0-9a-fA-F]{24})?)/;
> +

Instead of this dense regex I did:

    my $sha1_len = 40;
    my $sha256_extra_len = 24;
    my $sha256_len = $sha1_len + $sha256_extra_len;

    sub oid_nlen_regex {
    	my $len = shift;
    	my $hchr = qr/[0-9a-fA-F]/;
    	return qr/(?:(?:$hchr){$len})/
    }

    our $oid_regex;
    {
        my $x = oid_nlen_regex($sha1_len);
        my $y = oid_nlen_regex($sha256_extra_len);
        $oid_regex = qr/(?:$x(?:$y)?)/
    }

Then most of the rest of this is the same, e.g.:
> -	if ($input =~ m/^[0-9a-fA-F]{40}$/) {

But...

> @@ -2037,10 +2040,10 @@ sub format_log_line_html {
>              (?<!-) # see strbuf_check_tag_ref(). Tags can't start with -
>              [A-Za-z0-9.-]+
>              (?!\.) # refs can't end with ".", see check_refname_format()
> -            -g[0-9a-fA-F]{7,40}
> +            -g[0-9a-fA-F]{7,64}
>              |
>              # Just a normal looking Git SHA1
> -            [0-9a-fA-F]{7,40}
> +            [0-9a-fA-F]{7,64}
>          )
>          \b
>      }{

E.g. here we can do call oid_nlen_regex("7,64") to produce this blurb.

> -	if ($line =~ m/^index [0-9a-fA-F]{40},[0-9a-fA-F]{40}/) {
> +	if ($line =~ m/^index $oid_regex,$oid_regex/) {
> -	} elsif ($line =~ m/^index [0-9a-fA-F]{40}..[0-9a-fA-F]{40}/) {
> +	} elsif ($line =~ m/^index $oid_regex..$oid_regex/) {

And here, maybe nobody cares, but we now implicitly accept mixed SHA-1 &
SHA-256 input. Whereas we could have a helper on top of the above code
like:

    sub oid_nlen_prefix_infix_regex {
        my $nlen = shift;
        my $prefix = shift;
        my $infix = shift;

        my $rx = oid_nlen_regex($nlen);

        return qr/^\Q$prefix\E$rx\Q$infix\E$rx$/;
    }

And then e.g.:

    } elsif ($line =~ oid_nlen_prefix_infix_regex($sha1_len, "index ", "..") ||
             $line =~ oid_nlen_prefix_infix_regex($sha256_len, "index ", "..")) {

So only accept SHA1..SHA1 or SHA256..SHA256, not SHA1..SHA256 or
SHA256..SHA1.