Re: [PATCH v3 04/14] commit-graph: implement write_commit_graph()

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 08 Feb 2018 14:14:21 -0800

Derrick Stolee <stolee@xxxxxxxxx> writes:

> +char* get_commit_graph_filename_hash(const char *pack_dir,

Asterisk sticks to the identifier, not type, in our codebase.

> +				     struct object_id *hash)
> +{
> +	size_t len;
> +	struct strbuf path = STRBUF_INIT;
> +	strbuf_addstr(&path, pack_dir);
> +	strbuf_addstr(&path, "/graph-");
> +	strbuf_addstr(&path, oid_to_hex(hash));
> +	strbuf_addstr(&path, ".graph");

Use of strbuf_addf() would make it easier to read and maintain, no?

> +
> +	return strbuf_detach(&path, &len);
> +}
> +
> +static void write_graph_chunk_fanout(struct sha1file *f,
> +				     struct commit **commits,
> +				     int nr_commits)
> +{
> +	uint32_t i, count = 0;
> +	struct commit **list = commits;
> +	struct commit **last = commits + nr_commits;
> +
> +	/*
> +	 * Write the first-level table (the list is sorted,
> +	 * but we use a 256-entry lookup to be able to avoid
> +	 * having to do eight extra binary search iterations).
> +	 */
> +	for (i = 0; i < 256; i++) {
> +		while (list < last) {
> +			if ((*list)->object.oid.hash[0] != i)
> +				break;
> +			count++;
> +			list++;
> +		}

If count and list are always incremented in unison, perhaps you do
not need an extra variable "last".  If typeof(nr_commits) is wider
than typeof(count), this loop and the next write-be32 is screwed
anyway ;-)

This comment probably applies equally to some other uses of the same
"compute last pointer to compare with running pointer for
termination" pattern in this patch.

> +		sha1write_be32(f, count);
> +	}
> +}

> +static int if_packed_commit_add_to_list(const struct object_id *oid,

That is a strange name.  "collect packed commits", perhaps?

> +					struct packed_git *pack,
> +					uint32_t pos,
> +					void *data)
> +{
> +	struct packed_oid_list *list = (struct packed_oid_list*)data;
> +	enum object_type type;
> +	unsigned long size;
> +	void *inner_data;
> +	off_t offset = nth_packed_object_offset(pack, pos);
> +	inner_data = unpack_entry(pack, offset, &type, &size);
> +
> +	if (inner_data)
> +		free(inner_data);
> +
> +	if (type != OBJ_COMMIT)
> +		return 0;
> +
> +	ALLOC_GROW(list->list, list->nr + 1, list->alloc);

This probably will become inefficient in large repositories.  You
know you'll be walking all the pack files, and total number of
objects in a packfile can be read cheaply, so it may make sense to
make a rough guestimate of the number of commits (e.g. 15-25% of the
total number of objects) in the repository and allocate the list
upfront, instead of a hardcoded 1024.