Re: [PATCH 09/23] pack v4: commit object encoding

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nicolas Pitre <nico@xxxxxxxxxxx> writes:

> This goes as follows:
>
> - Tree reference: either variable length encoding of the index
>   into the SHA1 table or the literal SHA1 prefixed by 0 (see
>   add_sha1_ref()).
>
> - Parent count: variable length encoding of the number of parents.
>   This is normally going to occupy a single byte but doesn't have to.
>
> - List of parent references: a list of add_sha1_ref() encoded references,
>   or nothing if the parent count was zero.
>
> - Author reference: variable length encoding of an index into the author
>   string dictionary table which also covers the time zone.  To make the
>   overall encoding efficient, the author table is already sorted by usage
>   frequency so the most used names are first and require the shortest
>   index encoding.
>
> - Author time stamp: variable length encoded.  Year 2038 ready!
>
> - Committer reference: same as author reference.
>
> - Committer time stamp: same as author time stamp.
>
> The remainder of the canonical commit object content is then zlib
> compressed and appended to the above.
>
> Rationale: The most important commit object data is densely encoded while
> requiring no zlib inflate processing, and all SHA1 references are most
> likely to be direct indices into the pack index file requiring no SHA1
> search into the pack index file.
>
> Signed-off-by: Nicolas Pitre <nico@xxxxxxxxxxx>
> ---
>  packv4-create.c | 119 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 119 insertions(+)
>
> diff --git a/packv4-create.c b/packv4-create.c
> index bf33d15..cedbbd9 100644
> --- a/packv4-create.c
> +++ b/packv4-create.c
> @@ -13,6 +13,9 @@
>  #include "tree-walk.h"
>  #include "pack.h"
>  
> +
> +static int pack_compression_level = Z_DEFAULT_COMPRESSION;
> +
>  struct data_entry {
>  	unsigned offset;
>  	unsigned size;
> @@ -289,6 +292,122 @@ static unsigned char *add_sha1_ref(unsigned char *dst, const unsigned char *sha1
>  	return dst + 20;
>  }
>  
> +/*
> + * This converts a canonical commit object buffer into its
> + * tightly packed representation using the already populated
> + * and sorted commit_name_table dictionary.  The parsing is
> + * strict so to ensure the canonical version may always be
> + * regenerated and produce the same hash.
> + */
> +void * conv_to_dict_commit(void *buffer, unsigned long *psize)

Drop SP between asterisk and "conv_"?

> +{
> +	unsigned long size = *psize;
> +	char *in, *tail, *end;
> +	unsigned char *out;
> +	unsigned char sha1[20];
> +	int nb_parents, index, tz_val;
> +	unsigned long time;
> +	z_stream stream;
> +	int status;
> +
> +	/*
> +	 * It is guaranteed that the output is always going to be smaller
> +	 * than the input.  We could even do this conversion in place.
> +	 */
> +	in = buffer;
> +	tail = in + size;
> +	buffer = xmalloc(size);
> +	out = buffer;
> +
> +	/* parse the "tree" line */
> +	if (in + 46 >= tail || memcmp(in, "tree ", 5) || in[45] != '\n')
> +		goto bad_data;
> +	if (get_sha1_hex(in + 5, sha1) < 0)
> +		goto bad_data;

Is this strict enough to guarantee roundtrip hash identity?  Because
get_sha1_hex() accepts hexadecimal represented with uppercase A-F,
you need to reject such a "broken" commit object, no?

Same for parent commit object names below that are parsed with the
same helper.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]