Re: [RFC PATCH v1 02/17] merge-one-file: rewrite in C

Alban Gruin <alban.gruin@xxxxxxxxx> · Sun, 12 Jul 2020 13:22:26 +0200 (CEST)

Hi Phillip,

Phillip Wood (phillip.wood123@xxxxxxxxx) a écrit :

> Hi Alban
> 
> I think this series is a great idea
> 
> On 25/06/2020 13:19, Alban Gruin wrote:
> -%<-
> > diff --git a/builtin/merge-one-file.c b/builtin/merge-one-file.c
> > new file mode 100644
> > index 0000000000..4992a6cd30
> > --- /dev/null
> > +++ b/builtin/merge-one-file.c
> > @@ -0,0 +1,275 @@
> > +/*
> > + * Builtin "git merge-one-file"
> > + *
> > + * Copyright (c) 2020 Alban Gruin
> > + *
> > + * Based on git-merge-one-file.sh, written by Linus Torvalds.
> > + *
> > + * This is the git per-file merge script, called with
> > + *
> > + *   $1 - original file SHA1 (or empty)
> > + *   $2 - file in branch1 SHA1 (or empty)
> > + *   $3 - file in branch2 SHA1 (or empty)
> > + *   $4 - pathname in repository
> > + *   $5 - original file mode (or empty)
> > + *   $6 - file in branch1 mode (or empty)
> > + *   $7 - file in branch2 mode (or empty)
> 
> nit pick - these are now argv[1] etc rather than $1 etc
> 

I'll change that, and replace "script" by "utility".

> > + *
> > + * Handle some trivial cases.. The _really_ trivial cases have
> > + * been handled already by git read-tree, but that one doesn't
> > + * do any merges that might change the tree layout.
> > + */
> > +
> > +#define USE_THE_INDEX_COMPATIBILITY_MACROS
> > +#include "cache.h"
> > +#include "builtin.h"
> > +#include "commit.h"
> > +#include "dir.h"
> > +#include "lockfile.h"
> > +#include "object-store.h"
> > +#include "run-command.h"
> > +#include "xdiff-interface.h"
> > +
> > +static int create_temp_file(const struct object_id *oid, struct strbuf
> > *path)
> > +{
> > +	struct child_process cp = CHILD_PROCESS_INIT;
> > +	struct strbuf err = STRBUF_INIT;
> > +	int ret;
> > +
> > +	cp.git_cmd = 1;
> > +	argv_array_pushl(&cp.args, "unpack-file", oid_to_hex(oid), NULL);
> > +	ret = pipe_command(&cp, NULL, 0, path, 0, &err, 0);
> > +	if (!ret && path->len > 0)
> > +		strbuf_trim_trailing_newline(path);
> > +
> > +	fprintf(stderr, "%.*s", (int) err.len, err.buf);
> > +	strbuf_release(&err);
> > +
> > +	return ret;
> > +}
> 
> I know others will disagree but personally I'm not a huge fan of rewriting
> shell functions in C that forks other builtins and then converting the C to
> use the internal apis, it seems a much better to just write the proper C
> version the first time. This is especially true for simple function such as
> the ones in this file. That way the reviewer gets a clear view of the final
> code from the patch, rather than having to piece it together from a series of
> additions and deletions.
> 

I understand -- I'll squash the "rewrite" and "use internal APIs" patches 
together as a last step for the v2, so I'd be able to get them back with 
all the changes made in the v2 if needed.

> -%<-
> > +static int do_merge_one_file(const struct object_id *orig_blob,
> > +			     const struct object_id *our_blob,
> > +			     const struct object_id *their_blob, const char
> > *path,
> > +			     unsigned int orig_mode, unsigned int our_mode,
> > unsigned int their_mode)
> > +{
> > +	int ret, source, dest;
> > +	struct strbuf src1 = STRBUF_INIT, src2 = STRBUF_INIT, orig =
> > STRBUF_INIT;
> > +	struct child_process cp_merge = CHILD_PROCESS_INIT,
> > +		cp_checkout = CHILD_PROCESS_INIT,
> > +		cp_update = CHILD_PROCESS_INIT;
> > +
> > +	if (our_mode == S_IFLNK || their_mode == S_IFLNK) {
> > +		fprintf(stderr, "ERROR: %s: Not merging symbolic link
> > changes.\n", path);
> > +		return 1;
> > +	} else if (our_mode == S_IFGITLINK || their_mode == S_IFGITLINK) {
> > +		fprintf(stderr, "ERROR: %s: Not merging conflicting submodule
> > changes.\n",
> > +			path);
> > +		return 1;
> > +	}
> > +
> > +	create_temp_file(our_blob, &src1);
> > +	create_temp_file(their_blob, &src2);
> > +
> > +	if (orig_blob) {
> > +		printf("Auto-merging %s\n", path);
> > +		create_temp_file(orig_blob, &orig);
> > +	} else {
> > +		printf("Added %s in both, but differently.\n", path);
> > +		create_temp_file(the_hash_algo->empty_blob, &orig);
> > +	}
> > +
> > +	cp_merge.git_cmd = 1;
> > +	argv_array_pushl(&cp_merge.args, "merge-file", src1.buf, orig.buf,
> > src2.buf,
> > +			 NULL);
> > +	ret = run_command(&cp_merge);
> > +
> > +	if (ret != 0)
> > +		ret = 1;
> > +
> > +	cp_checkout.git_cmd = 1;
> > +	argv_array_pushl(&cp_checkout.args, "checkout-index", "-f",
> > "--stage=2",
> > +			 "--", path, NULL);
> > +	if (run_command(&cp_checkout))
> > +		return 1;
> > +
> > +	source = open(src1.buf, O_RDONLY);
> > +	dest = open(path, O_WRONLY | O_TRUNC);
> > +
> > +	copy_fd(source, dest);
> > +
> > +	close(source);
> > +	close(dest);
> > +
> > +	unlink(orig.buf);
> > +	unlink(src1.buf);
> > +	unlink(src2.buf);
> > +
> > +	strbuf_release(&src1);
> > +	strbuf_release(&src2);
> > +	strbuf_release(&orig);
> 
> The whole business of creating temporary files and forking seems like a lot of
> effort compared to calling ll_merge() which would also mean we respect any
> merge attributes
> 
> > +
> > +	if (ret) {
> > +		fprintf(stderr, "ERROR: ");
> > +
> > +		if (!orig_blob) {
> 
> I think the original does if (ret || !orig_blob) not &&

Good catch.

> > +			fprintf(stderr, "content conflict");
> > +			if (our_mode != their_mode)
> > +				fprintf(stderr, ", ");
> 
> sentence lego, in any case the message below should be printed regardless of
> content conflicts. We should probably mark all these messages for translation
> as well.
> 

Yeah, I think I will replace them with two calls to `error()'.

> -%<-
> > +int cmd_merge_one_file(int argc, const char **argv, const char *prefix)
> > +{
> > +	struct object_id orig_blob, our_blob, their_blob,
> > +		*p_orig_blob = NULL, *p_our_blob = NULL, *p_their_blob = NULL;
> > +	unsigned int orig_mode = 0, our_mode = 0, their_mode = 0;
> > +
> > +	if (argc != 8)
> > +		usage(builtin_merge_one_file_usage);
> > +
> > +	if (!get_oid(argv[1], &orig_blob)) {
> > +		p_orig_blob = &orig_blob;
> > +		orig_mode = strtol(argv[5], NULL, 8);
> 
> It would probably make sense to check that strtol() succeeds (and the mode is
> sensible), and also that get_oid() fails because argv[1] is empty, not because
> it is invalid.
> 

Checking that `orig_mode' and friends are lower than 0800, and that 
`*argv[1]' is not equal to '\0' should be enough, right?

> Thanks for working on this

As always, thank you for your reviews.

> Best Wishes
> 
> Phillip
> 
> 

Cheers,
Alban