Re: [PATCH 1/2] stash--helper: implement "git stash--helper"

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 28 Jan 2016 15:06:27 -0800

Matthias Asshauer <mha1993@xxxxxxx> writes:

> From: Matthias Aßhauer <mha1993@xxxxxxx>
>
> This patch implements a new "git stash--helper" builtin plumbing
> command that will be used to migrate "git-stash.sh" to C.
>
> We start by implementing only the "--non-patch" option that will
> handle the core part of the non-patch stashing.
>
> Signed-off-by: Matthias Aßhauer <mha1993@xxxxxxx>
> ---
>  Makefile                |  2 ++
>  builtin.h               |  1 +
>  builtin/stash--helper.c | 13 ++++++++++
>  git.c                   |  1 +
>  stash.c                 | 65 +++++++++++++++++++++++++++++++++++++++++++++++++
>  stash.h                 | 11 +++++++++

Hmph, why not have everything inside builtin/stash--helper.c?  I do
not quite see a point of having the other two "library-ish" looking
files.

Also I personally feel that it would be easier to review when
these two patches are squashed into one.  I had to go back and forth
while reading the "non-patch" C function to see what logic from the
scripted version it is trying to replace.

> diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
> new file mode 100644
> index 0000000..542e782
> --- /dev/null
> +++ b/builtin/stash--helper.c
> @@ -0,0 +1,13 @@
> +#include "../stash.h"
> +#include <string.h>
> +
> +static const char builtin_stash__helper_usage[] = {
> +	"Usage: git stash--helper --non-patch <tmp_indexfile> <i_tree>"
> +};
> +
> +int cmd_stash__helper(int argc, const char **argv, const char *prefix)
> +{
> +	if (argc == 4 && !strcmp("--non-patch", argv[1]))
> +		return stash_non_patch(argv[2], argv[3], prefix);
> +	usage(builtin_stash__helper_usage);
> +}

This is meant to replace this sequence:

	git read-tree --index-output="$TMPindex" -m $i_tree &&
	GIT_INDEX_FILE="$TMPindex" &&
	export GIT_INDEX_FILE &&
	git diff --name-only -z HEAD -- >"$TMP-stagenames" &&
	git update-index -z --add --remove --stdin <"$TMP-stagenames" &&
	git write-tree &&
	rm -f "$TMPindex"

And outside of this section of the script, $TMPindex is never looked
at after this part finishes (which is obvious as the last thing the
section does is to remove it).  As you are rewriting this whole
section in C, you should wonder if you can do it without using a
temporary file in the filesystem at all.

> diff --git a/stash.c b/stash.c
> new file mode 100644
> index 0000000..c3d6e67
> --- /dev/null
> +++ b/stash.c
> @@ -0,0 +1,65 @@
> +#include "stash.h"
> +#include "strbuf.h"
> +
> +static int prepare_update_index_argv(struct argv_array *args,
> +	struct strbuf *buf)
> +{
> +	struct strbuf **bufs, **b;
> +
> +	bufs = strbuf_split(buf, '\0');
> +	for (b = bufs; *b; b++)
> +		argv_array_pushf(args, "%s", (*b)->buf);
> +	argv_array_push(args, "--");
> +	strbuf_list_free(bufs);
> +
> +	return 0;
> +}
> +
> +int stash_non_patch(const char *tmp_indexfile, const char *i_tree,
> +	const char *prefix)
> +{
> +	int result;
> +	struct child_process read_tree = CHILD_PROCESS_INIT;
> +	struct child_process diff = CHILD_PROCESS_INIT;
> +	struct child_process update_index = CHILD_PROCESS_INIT;
> +	struct child_process write_tree = CHILD_PROCESS_INIT;
> +	struct strbuf buf = STRBUF_INIT;
> +
> +	argv_array_push(&read_tree.args, "read-tree");
> +	argv_array_pushf(&read_tree.args, "--index-output=%s", tmp_indexfile);
> +	argv_array_pushl(&read_tree.args, "-m", i_tree, NULL);
> +
> +	argv_array_pushl(&diff.args, "diff", "--name-only", "-z", "HEAD", "--",
> +		NULL);
> +
> +	argv_array_pushl(&update_index.args, "update-index", "--add",
> +		"--remove", NULL);
> +
> +	argv_array_push(&write_tree.args, "write-tree");

Honestly, I had high hopes after seeing the "we are rewriting it in
C" but I am not enthused after seeing this.  I was hoping that the
rewritten version would do this all in-core, by calling these
functions that we already have:

 * read_index() to read the current index into the_index;

 * unpack_trees() to overlay the contents of i_tree to the contents
   of the index;

 * run_diff_index() to make the comparison between the result of the
   above and HEAD to collect the paths that are different (you'd use
   DIFF_FORMAT_CALLBACK mechanism to collect paths---see wt-status.c
   for existing code that already does this for hints);

 * add_to_index() to add or remove paths you found in the previous
   step to the in-core index;

 * write_cache_as_tree() to write out the resulting index of the
   above sequence of calls to a new tree object;

 * sha1_to_hex() to convert that resulting tree object name to hex
   format;

 * puts() to output the result.

Actually, because i_tree is coming from $(git write-tree) that was
done earlier on the current index, the unpack_trees() step should
not even be necessary.

The first three lines of the scripted version:

	git read-tree --index-output="$TMPindex" -m $i_tree &&
	GIT_INDEX_FILE="$TMPindex" &&
	export GIT_INDEX_FILE &&

are creating a new file $TMPindex on the filesystem while preserving
the cached stat info when it can, which is a glorified version of:

	cp -p $GIT_INDEX_FILE $TMPindex

In fact, versions of "git stash" before 3ba2e865 (stash: copy the
index using --index-output instead of cp -p, 2011-03-16) simply
did a "cp -p".

A C rewrite that works all in-core does not even need to write out a
temporary; it can just read the current index and do various things
up to writing the contents of the in-core index as a tree, and the
result would be correct as long as you do not forget *NOT* to write
the in-core index out to $GIT_INDEX_FILE.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html