Re: [PATCH v2 1/3] patch-id: make it stable against hunk reordering

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 28 Mar 2014 15:12:17 -0700

"Michael S. Tsirkin" <mst@xxxxxxxxxx> writes:

> +static void flush_one_hunk(unsigned char *result, git_SHA_CTX *ctx)
>  {
> -	int patchlen = 0, found_next = 0;
> +	unsigned char hash[20];
> +	unsigned short carry = 0;
> +	int i;
> +
> +	git_SHA1_Final(hash, ctx);
> +	git_SHA1_Init(ctx);
> +	/* 20-byte sum, with carry */
> +	for (i = 0; i < 20; ++i) {
> +		carry += result[i] + hash[i];
> +		result[i] = carry;
> +		carry >>= 8;
> +	}

Was there a reason why bitwise xor is not sufficient for mixing
these two indenendent hashes?  If the 20-byte sums do not offer
benefit over that, the code for bitwise xor would be certainly be
simpler, I would imagine?

> +}
> +static int get_one_patchid(unsigned char *next_sha1, unsigned char *result,
> +			   struct strbuf *line_buf, int stable)
> +{
> +	int patchlen = 0, found_next = 0, hunks = 0;
>  	int before = -1, after = -1;
> +	git_SHA_CTX ctx, header_ctx;
> +
> +	git_SHA1_Init(&ctx);
> +	hashclr(result);
>  
>  	while (strbuf_getwholeline(line_buf, stdin, '\n') != EOF) {
>  		char *line = line_buf->buf;
> @@ -99,6 +116,18 @@ static int get_one_patchid(unsigned char *next_sha1, git_SHA_CTX *ctx, struct st
>  			if (!memcmp(line, "@@ -", 4)) {
>  				/* Parse next hunk, but ignore line numbers.  */
>  				scan_hunk_header(line, &before, &after);
> +				if (stable) {
> +					if (hunks) {
> +						flush_one_hunk(result, &ctx);
> +						memcpy(&ctx, &header_ctx,
> +						       sizeof ctx);
> +					} else {
> +						/* Save ctx for next hunk.  */
> +						memcpy(&header_ctx, &ctx,
> +						       sizeof ctx);
> +					}
> +				}
> +				hunks++;
>  				continue;
>  			}
>  
> @@ -107,7 +136,10 @@ static int get_one_patchid(unsigned char *next_sha1, git_SHA_CTX *ctx, struct st
>  				break;
>  
>  			/* Else we're parsing another header.  */
> +			if (stable && hunks)
> +				flush_one_hunk(result, &ctx);
>  			before = after = -1;
> +			hunks = 0;
>  		}
>  
>  		/* If we get here, we're inside a hunk.  */
> @@ -119,39 +151,46 @@ static int get_one_patchid(unsigned char *next_sha1, git_SHA_CTX *ctx, struct st
>  		/* Compute the sha without whitespace */
>  		len = remove_space(line);
>  		patchlen += len;
> -		git_SHA1_Update(ctx, line, len);
> +		git_SHA1_Update(&ctx, line, len);
>  	}
>  
>  	if (!found_next)
>  		hashclr(next_sha1);
>  
> +	flush_one_hunk(result, &ctx);

What I read from these changes is that you do not do anything
special about the per-file header, so two no overlapping patches
with a single hunk each that touches the same path concatenated
together would not result in the same patch-id as a single-patch
that has the same two hunks.  Which would break your earlier 'Yes,
reordering only the hunks will not make sense, but before each hunk
you could insert the same "diff --git a/... b/..." to make them a
concatenation of patches that touch the same file', I would think.

Is that what we want to happen?  Or is my reading mistaken?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html