Re: [RFC] introducing git replay

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 13 Apr 2022 10:48:48 -0700

Edmundo Carmona Antoranz <eantoranz@xxxxxxxxx> writes:

I've already gave my take on "is this interesting?" question with a
"not really", but let's look at the code, as the expertise will
translate to your future contributions easily, even if this
particular code may turn out not to be used in the project.

> +static unsigned int replay_indexof(struct commit *commit,
> +				   struct commit_list *list)
> +{
> +	int res;
> +	if (list == NULL)
> +		return -1;

We encourage to have a blank line between the end of the decls and
the beginning of the statements.  Add one after the line that
declares the variable "res".  

In an existing code that consistently uses the abbreviation, it is a
different story, but "result" is not too long to spell out, and
because you do not use the variable that often anyway, you would
avoid unnecessary friction on the readers not to abbreviate it to
"res" here.

> +	if (!oidcmp(&list->item->object.oid,
> +		    &commit->object.oid))

	if (!oidcmp(&list->item->object.oid, &commit->object.oid))

is 66 columns wide and this line is wrapped too short, without an
apparent upside to make the result any easier to read.

> +		return 0;
> +	res = replay_indexof(commit, list->next);

Do we need to go recursive here?  It feels wasteful, compared to
iteratively doing this.  FWIW, is the singly chained commit_list the
best data structure if you have "a collection of commits, among
which you'd need to find an existing one, if any"?  You may want to
consider using a hashset, perhaps, as the only thing you seem to be
getting out of the data structure is "is this among the old_commits
set?"  Another possibility, if you do not call APIs that use the
object flags, may be to allocate a flag bit and mark these commits
in the original history you discover via get_revision(), instead of
placing them in a singly chained commit_list structure.  Then the
loop in replay_commit() can just see if parent->object.flags has
that "in the original history?" bit set to decide.

> +	return res < 0 ? res : res + 1;
> +}
> +
> +static struct commit *replay_find_commit(const char *name)
> +{
> +	struct commit *commit = lookup_commit_reference_by_name(name);
> +	if (!commit)
> +		die(_("no such branch/commit '%s'"), name);
> +	return commit;
> +}
> +
> +static struct commit* replay_commit(struct commit * orig_commit)

In our codebase, an asterisk sticks to the identifier, not the type.

> +{
> +	struct pretty_print_context ctx = {0};
> +	struct strbuf body = STRBUF_INIT;
> +	struct strbuf author = STRBUF_INIT;
> +	struct strbuf committer = STRBUF_INIT;
> +	struct object_id new_commit_oid;
> +	struct commit *new_commit;
> +
> +	struct commit_list *new_parents_head = NULL;
> +	struct commit_list **new_parents = &new_parents_head;
> +	struct commit_list *parents = orig_commit->parents;
> +	while (parents) {

It is not _wrong_ to have a blank line inside the decl block if
there is a logical separation between the groups.  I am not sure if
the one in the above after "struct commit *new_commit" qualifies as
one.

Regardless, after such a large decl block, we want a blank line
before the first statement.

> +		struct commit *parent = parents->item;
> +		int commit_index;
> +		struct commit *new_parent;
> +
> +		commit_index = replay_indexof(parent, old_commits);
> +
> +		if (commit_index < 0)
> +			 // won't be replayed, use the original parent

We still frown upon // comments in this codebase.

> +			new_parent = parent;
> +		else {
> +			// it might have been translated already
> +			if (!new_commits[commit_index])
> +				new_commits[commit_index] = replay_commit(parent);
> +			new_parent = new_commits[commit_index];
> +		}
> +		new_parents = commit_list_append(new_parent, new_parents);
> +		parents = parents->next;
> +	}
> +
> +	format_commit_message(orig_commit, "%B", &body, &ctx);
> +	// TODO timezones
> +	format_commit_message(orig_commit, "%an <%ae> %at +0000", &author, &ctx);
> +	// TODO consider committer (control with an option)
> +	format_commit_message(orig_commit, "%cn <%ce> %ct +0000", &committer, &ctx);

Yuck.  Shouldn't this code, whose purpose is to replay the messages
and metadata of the commit as faithfully as possible while
reparenting them, be just reusing the original commit object?

Perhaps turning save_commit_buffer on, reading the original commit
object in the raw format, rewriting the parent pointers (and nothing
else) and finally calling write_object_file() to create the new
commit object is what this code should be doing instead.

And that would be another reason why this is probably better done as
fast-export piped to fast-import, but this message is not about the
design of the "feature" itself, so let me stop talking about that.