Re: [PATCH v3 2/2] difftool: implement the functionality in the builtin

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



W dniu 24.11.2016 o 21:55, Johannes Schindelin pisze:

> The current version of the builtin difftool does not, however, make full
> use of the internals but instead chooses to spawn a couple of Git
> processes, still, to make for an easier conversion. There remains a lot
> of room for improvement, left for a later date.
[...]

> Sadly, the speedup is more noticable on Linux than on Windows: a quick
> test shows that t7800-difftool.sh runs in (2.183s/0.052s/0.108s)
> (real/user/sys) in a Linux VM, down from  (6.529s/3.112s/0.644s), while
> on Windows, it is (36.064s/2.730s/7.194s), down from
> (47.637s/2.407s/6.863s). The culprit is most likely the overhead
> incurred from *still* having to shell out to mergetool-lib.sh and
> difftool--helper.sh.

Does this mean that our shell-based testsuite is not well suited to be
benchmark suite for comparing performance on MS Windows?

Or does it mean that "builtin-difftool" spawning Git processes is the
problem?
 
> Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx>
> ---
>  builtin/difftool.c | 670 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 669 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/difftool.c b/builtin/difftool.c
> index 53870bb..3480920 100644
> --- a/builtin/difftool.c
> +++ b/builtin/difftool.c
> @@ -11,9 +11,608 @@
>   *
>   * Copyright (C) 2016 Johannes Schindelin
>   */
> +#include "cache.h"
>  #include "builtin.h"
>  #include "run-command.h"
>  #include "exec_cmd.h"
> +#include "parse-options.h"
> +#include "argv-array.h"
> +#include "strbuf.h"
> +#include "lockfile.h"
> +#include "dir.h"
> +
> +static char *diff_gui_tool;
> +static int trust_exit_code;
> +
> +static const char *const builtin_difftool_usage[] = {
> +	N_("git difftool [<options>] [<commit> [<commit>]] [--] [<path>...]"),
> +	NULL
> +};
> +
> +static int difftool_config(const char *var, const char *value, void *cb)
> +{
> +	if (!strcmp(var, "diff.guitool")) {

Shouldn't you also read other configuration variables, like "diff.tool",
and it's mergetool fallbacks ("merge.guitool", "merge.tool")?

> +		diff_gui_tool = xstrdup(value);
> +		return 0;
> +	}
> +
> +	if (!strcmp(var, "difftool.trustexitcode")) {
> +		trust_exit_code = git_config_bool(var, value);
> +		return 0;
> +	}

Why you do not need to check "difftool.prompt"?  And "mergetool.*" fallbacks?

> +
> +	return git_default_config(var, value, cb);
> +}
> +
> +static int print_tool_help(void)
> +{
> +	const char *argv[] = { "mergetool", "--tool-help=diff", NULL };
> +	return run_command_v_opt(argv, RUN_GIT_CMD);

This looks a bit strange to me, but I guess this is to avoid recursively
invoking ourself, and { "legacy-difftool", "--tool-help", NULL }; isn't
that much better.

> +}
> +
> +static int parse_index_info(char *p, int *mode1, int *mode2,
> +			    struct object_id *oid1, struct object_id *oid2,
> +			    char *status)

  There are only two hard things in Computer Science:
  cache invalidation and naming things.
    -- Phil Karlton

Why did you name function that parses "diff --raw" output (aka "diff-tree"
output) parse_index_info() instead of parse_raw_diff() or parse_diff_tree()?
I went searching for `update-index --index-info` formats...

This is not that important, because this function is file-static; though
future developers would thank you for more descriptive naming.

ADDED: Disregard this, I see this function is about index (?) part of
raw diff, that is only a part of "git diff --raw -z" output.  Though...

> +{
> +	if (*p != ':')
> +		return error("expected ':', got '%c'", *p);
> +	*mode1 = (int)strtol(p + 1, &p, 8);
> +	if (*p != ' ')
> +		return error("expected ' ', got '%c'", *p);

Nitpicking.

I guess because this error shouldn't really happen, and because current
implementation is transient, we don't need to worry about better error
messages (was it problem with parsing, or was it unexpected character).

For example '10064x', or '10064\n' would fail parse, but it is not
space that we were expecting...

> +	*mode2 = (int)strtol(p + 1, &p, 8);
> +	if (*p != ' ')
> +		return error("expected ' ', got '%c'", *p);
> +	if (get_oid_hex(++p, oid1))
> +		return error("expected object ID, got '%s'", p + 1);
> +	p += GIT_SHA1_HEXSZ;
> +	if (*p != ' ')
> +		return error("expected ' ', got '%c'", *p);
> +	if (get_oid_hex(++p, oid2))
> +		return error("expected object ID, got '%s'", p + 1);
> +	p += GIT_SHA1_HEXSZ;
> +	if (*p != ' ')
> +		return error("expected ' ', got '%c'", *p);
> +	*status = *++p;
> +	if (!status || p[1])
> +		return error("unexpected trailer: '%s'", p);
> +	return 0;
> +}
> +
> +/*
> + * Remove any trailing slash from $workdir
> + * before starting to avoid double slashes in symlink targets.
> + */

Err... that's not what add_path() does, in its current implementation.
It doesn't remove trailing slashes, but it checks if there is trailing
slash, and if there isn't, it adds it as separator before adding path.

Or was it original comment from the Perl implementation?  It look
like this, with '$workdir'...  If it is meant to be straight copy
of comment from legacy-difftool, a note would be nice.

> +static void add_path(struct strbuf *buf, size_t base_len, const char *path)

Naming: I think strbuf_addpath() would be a better name, but I guess
it is a matter of taste.

> +{
> +	strbuf_setlen(buf, base_len);
> +	if (buf->len && buf->buf[buf->len - 1] != '/')
> +		strbuf_addch(buf, '/');
> +	strbuf_addstr(buf, path);
> +}
> +
> +/*
> + * Determine whether we can simply reuse the file in the worktree.
> + */
> +static int use_wt_file(const char *workdir, const char *name,

Should it be 'name' or 'pathname'?

> +		       struct object_id *oid)
> +{
> +	struct strbuf buf = STRBUF_INIT;
> +	struct stat st;
> +	int use = 0;
> +
> +	strbuf_addstr(&buf, workdir);
> +	add_path(&buf, buf.len, name);

With proposed rename, it would IMVVVHO looks better

  +	strbuf_addstr(&buf, workdir);
  +	strbuf_addpath(&buf, buf.len, name);

But that is a matter of taste (again, the function is file-local).

> +
> +	if (!lstat(buf.buf, &st) && !S_ISLNK(st.st_mode)) {
> +		struct object_id wt_oid;
> +		int fd = open(buf.buf, O_RDONLY);
> +
> +		if (!index_fd(wt_oid.hash, fd, &st, OBJ_BLOB, name, 0)) {
> +			if (is_null_oid(oid)) {
> +				oidcpy(oid, &wt_oid);
> +				use = 1;
> +			} else if (!oidcmp(oid, &wt_oid))
> +				use = 1;
> +		}
> +	}
> +
> +	strbuf_release(&buf);
> +
> +	return use;
> +}

[...]

> +static int ensure_leading_directories(char *path)
> +{
> +	switch (safe_create_leading_directories(path)) {
> +		case SCLD_OK:
> +		case SCLD_EXISTS:
> +			return 0;
> +		default:
> +			return error(_("could not create leading directories "
> +				       "of '%s'"), path);
> +	}
> +}

Nice function, I wonder if it would be useful in other places.

> +
> +static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
> +			int argc, const char **argv)
> +{
> +	char tmpdir[PATH_MAX];
> +	struct strbuf info = STRBUF_INIT, lpath = STRBUF_INIT;
> +	struct strbuf rpath = STRBUF_INIT, buf = STRBUF_INIT;

Nitpicking.

To be symmetric, it could be reordered like this:

  +	struct strbuf info = STRBUF_INIT, buf = STRBUF_INIT;
  +	struct strbuf lpath = STRBUF_INIT, rpath = STRBUF_INIT;

See: lpath, rpath; ldir, rdir; ldir_len, rdir_len.

> +	struct strbuf ldir = STRBUF_INIT, rdir = STRBUF_INIT;
> +	struct strbuf wtdir = STRBUF_INIT;
> +	size_t ldir_len, rdir_len, wtdir_len;

[...]
> +	argv_array_pushl(&child.args, "diff", "--raw", "--no-abbrev", "-z",
> +			 NULL);
> +	for (i = 0; i < argc; i++)
> +		argv_array_push(&child.args, argv[i]);
> +	if (start_command(&child))
> +		die("could not obtain raw diff");
> +	fp = xfdopen(child.out, "r");
> +
> +	/* Build index info for left and right sides of the diff */
> +	i = 0;
> +	while (!strbuf_getline_nul(&info, fp)) {
> +		int lmode, rmode;
> +		struct object_id loid, roid;
> +		char status;
> +		const char *src_path, *dst_path;
> +		size_t src_path_len, dst_path_len;
> +
> +		if (starts_with(info.buf, "::"))
> +			die(N_("combined diff formats('-c' and '--cc') are "
> +			       "not supported in\n"
> +			       "directory diff mode('-d' and '--dir-diff')."));
> +
> +		if (parse_index_info(info.buf, &lmode, &rmode, &loid, &roid,
> +				     &status))

After rename it would read as:

  +		if (parse_raw_diff(info.buf, &lmode, &rmode, &loid, &roid,
  +				   &status))

Though now I see that you parse here index information of raw diff
(I think)... so disregard my musings.

> +			break;
> +		if (strbuf_getline_nul(&lpath, fp))
> +			break;
> +		src_path = lpath.buf;
> +		src_path_len = lpath.len;
> +
> +		i++;
> +		if (status != 'C' && status != 'R') {
> +			dst_path = src_path;
> +			dst_path_len = src_path_len;
> +		} else {
> +			if (strbuf_getline_nul(&rpath, fp))
> +				break;
> +			dst_path = rpath.buf;
> +			dst_path_len = rpath.len;
> +		}


[...]
> +	/*
> +	 * Changes to submodules require special treatment.This loop writes a

Here and in few other places you are missing space after full stop.

  +	 * Changes to submodules require special treatment. This loop writes a


> +	 * temporary file to both the left and right directories to show the
> +	 * change in the recorded SHA1 for the submodule.
> +	 */
> +	hashmap_iter_init(&submodules, &iter);
> +	while ((entry = hashmap_iter_next(&iter))) {
> +		if (*entry->left) {
> +			add_path(&ldir, ldir_len, entry->path);
> +			ensure_leading_directories(ldir.buf);
> +			write_file(ldir.buf, "%s", entry->left);
> +		}
> +		if (*entry->right) {
> +			add_path(&rdir, rdir_len, entry->path);
> +			ensure_leading_directories(rdir.buf);
> +			write_file(rdir.buf, "%s", entry->right);
> +		}
> +	}
> +
> +	/*
> +	 * Symbolic links require special treatment.The standard "git diff"

Same here.

> +	 * shows only the link itself, not the contents of the link target.
> +	 * This loop replicates that behavior.
> +	 */

Best,
-- 
Jakub Narębski




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]