Re: [PATCH 1/2] index-pack: hash non-delta objects while reading from stream

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nguyễn Thái Ngọc Duy  <pclouds@xxxxxxxxx> writes:

> +static void *unpack_entry_data(unsigned long offset, unsigned long size,
> +			       enum object_type type, unsigned char *sha1)
>  {
>  	int status;
>  	git_zstream stream;
>  	void *buf = xmalloc(size);
> +	git_SHA_CTX c;
> +	char hdr[32];
> +	int hdrlen;
> +	unsigned char *last_out;
> +
> +	if (!is_delta_type(type)) {
> +		hdrlen = sprintf(hdr, "%s %lu", typename(type), size)+1;

s/+/ & /;

> +		git_SHA1_Init(&c);
> +		git_SHA1_Update(&c, hdr, hdrlen);
> +	} else
> +		sha1 = NULL;
> ...
>  	memset(&stream, 0, sizeof(stream));
>  	git_inflate_init(&stream);
>  	stream.next_out = buf;
>  	stream.avail_out = size;
> +	last_out = buf;
>  
>  	do {
>  		stream.next_in = fill(1);
>  		stream.avail_in = input_len;
>  		status = git_inflate(&stream, 0);
>  		use(input_len - stream.avail_in);
> +		if (sha1)
> +			git_SHA1_Update(&c, last_out, stream.next_out - last_out);
> +		last_out = stream.next_out;
>  	} while (status == Z_OK);

Shouldn't "last_out" be scoped inside the loop, perhaps like this,

	do {
		unsigned char *boo = stream.next_out;
		stream.next_in = fill(1);
		stream.avail_in = input_len;
		status = git_inflate(&stream, 0);
		use(input_len - stream.avail_in);
		if (sha1)
			git_SHA1_Update(&c, boo, stream.next_out - boo);
	} while (status == Z_OK);

instead of leaking it to the whole function?

> @@ -711,6 +730,8 @@ static void resolve_delta(struct object_entry *delta_obj,
>  	free(delta_data);
>  	if (!result->data)
>  		bad_object(delta_obj->idx.offset, _("failed to apply delta"));
> +	hash_sha1_file(result->data, result->size,
> +		       typename(delta_obj->real_type), delta_obj->idx.sha1);
>  	sha1_object(result->data, result->size, delta_obj->real_type,
>  		    delta_obj->idx.sha1);
>  	counter_lock();

The call-chain after your patch looks like this:

    parse_pack_objects()
    . for each incoming object 
      -> unpack_raw_entry()
	 . check type
         -> unpack_entry_data()
            . check type
            . unpack to obtain the data while hashing
	      if the type can be hashed.
      . check type
      . if the data is complete, let sha1_object() check it

The updated sha1_object() no longer hashes but the caller is responsible
for giving a correct hash value for the data, so you added a call to
hash_sha1_file() to the other caller (i.e resolve_delta()) of the
function.

The flow of the resulting code feels somewhat iffy that two functions
that are very far apart in the callchain have to coordinate among
themselves to ensure that the object data is always hashed once: plain
ones in unpack_entry_data() and deltified ones in resolve_delta().  It
smells like inviting future bugs when we introduce new in-pack types
that are neither delta or plain.

Overall it looks correct.  Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]