Re: [PATCH 10/23] pack v4: tree object encoding

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 27 Aug 2013 08:44:09 -0700

Nicolas Pitre <nico@xxxxxxxxxxx> writes:

> This goes as follows:
>
> - Number of tree entries: variable length encoded.
>
> Then for each tree entry:
>
> - Path component reference: variable length encoded index into the path
>   string dictionary table which also covers the entry mode.  To make the
>   overall encoding efficient, the author table is already sorted by usage
>   frequency so the most used names are first and require the shortest
>   index encoding.

s/author table/tree path table/, I think. The reason why it is a
good idea to sort these tables by use count applies equally to both
the author table and the tree path table, though.

> - SHA1 reference: either variable length encoding of the index into the
>   SHA1 table or the literal SHA1 prefixed by 0 (see add_sha1_ref()).
>
> Rationale: all the tree object data is densely encoded while requiring
> no zlib inflate processing, and all SHA1 references are most likely to
> be direct indices into the pack index file requiring no SHA1 search.
> Path filtering can be accomplished on the path index directly without
> any string comparison during the tree traversal.
>
> Still lacking is some kind of delta encoding for multiple tree objects
> with only small differences between them.  But that'll come later.
>
> Signed-off-by: Nicolas Pitre <nico@xxxxxxxxxxx>
> ---
>  packv4-create.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 63 insertions(+)
>
> diff --git a/packv4-create.c b/packv4-create.c
> index cedbbd9..f46fbd9 100644
> --- a/packv4-create.c
> +++ b/packv4-create.c
> @@ -408,6 +408,69 @@ bad:
>  	return NULL;
>  }
>  
> +/*
> + * This converts a canonical tree object buffer into its
> + * tightly packed representation using the already populated
> + * and sorted tree_path_table dictionary.  The parsing is
> + * strict so to ensure the canonical version may always be
> + * regenerated and produce the same hash.
> + */
> +void * conv_to_dict_tree(void *buffer, unsigned long *psize)
> +{
> +	unsigned long size = *psize;
> +	unsigned char *in, *out, *end;
> +	struct tree_desc desc;
> +	struct name_entry name_entry;
> +	int nb_entries;
> +
> +	if (!size)
> +		return NULL;
> +
> +	/*
> +	 * We can't make sure the result will always be smaller.
> +	 * The smallest possible entry is "0 x\0<40 byte SHA1>"
> +	 * or 44 bytes.  The output entry may have a realistic path index
> +	 * encoding using up to 3 bytes, and a non indexable SHA1 meaning
> +	 * 41 bytes.  And the output data already has the and nb_entries
> +	 * headers.  In practice the output size will be significantly
> +	 * smaller but for now let's make it simple.
> +	 */
> +	in = buffer;
> +	out = xmalloc(size);
> +	end = out + size;
> +	buffer = out;
> +
> +	/* let's count how many entries there are */
> +	init_tree_desc(&desc, in, size);
> +	nb_entries = 0;
> +	while (tree_entry(&desc, &name_entry))
> +		nb_entries++;
> +	out = add_number(out, nb_entries);
> +
> +	init_tree_desc(&desc, in, size);
> +	while (tree_entry(&desc, &name_entry)) {
> +		int pathlen = tree_entry_len(&name_entry);
> +		int index = dict_add_entry(tree_path_table, name_entry.mode,
> +					   name_entry.path, pathlen);
> +		if (index < 0) {
> +			error("missing tree dict entry");
> +			free(buffer);
> +			return NULL;
> +		}
> +		if (end - out < 45) {
> +			unsigned long sofar = out - (unsigned char *)buffer;
> +			buffer = xrealloc(buffer, sofar + 45);
> +			out = (unsigned char *)buffer + sofar;
> +			end = out + 45;
> +		}
> +		out = add_number(out, index);
> +		out = add_sha1_ref(out, name_entry.sha1);
> +	}
> +
> +	*psize = out - (unsigned char *)buffer;
> +	return buffer;
> +}
> +
>  static struct pack_idx_entry *get_packed_object_list(struct packed_git *p)
>  {
>  	unsigned i, nr_objects = p->num_objects;
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html