Re: [PATCH v2] contrib: added git-diffall

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 22 Feb 2012 15:48:33 -0800

Tim Henigan <tim.henigan@xxxxxxxxx> writes:

> This script adds directory diff support to git.  It launches a single
> instance of the user-configured external diff tool and performs a
> directory diff between the specified revisions. The before/after files
> are copied to a tmp directory to do this.  Either 'diff.tool' or
> 'merge.tool' must be set before running the script.
>
> The existing 'git difftool' command already allows the user to view diffs
> using an external tool.  However if multiple files have changed, a
> separate instance of the tool is launched for each one.  This can be
> tedious when many files are involved.

We encourage our log messages to describe the problem first and then
present solution to the problem, so I would update the above perhaps like
this:

	The 'git difftool' command lets the user to use an external tool
	to view diffs, but it runs the tool for one file at the time. This
	makes it tedious to review a change that spans multiple files.

        The "git-diffall" script instead prepares temporary directories
        with preimage and postimage files, and launches a single instance
        of an external diff tool to view the differences in them.
        diff.tool or merge.tool configuration variable is used to specify
        what external tool is used.

I am wondering if reusing "diff.tool" or "merge.tool" is a good idea,
though.

I guess that it is OK to assume that any external tool that can compare
two directories MUST be able to compare two individual files, and if that
is true, it is perfectly fine to reuse the configuration.  But if an
external tool "frobdiff" that can compare two directories cannot compare
two individual files, it will make it impossible for the user to run "git
difftool" if diff.tool is set to "frobdiff" to use with "diffall".

Another thing that comes to my mind is if a user has an external tool that
can use "diffall", is there ever a situation where the user chooses to use
"difftool" instead, to go files one by one.  I cannot offhand imagine any.

Perhaps a longer term integration plan may be to lift the logic from this
script and make it part of "difftool", and then add a boolean variable
"difftool.<tool>.canCompareDirectory", without adding "git diffall" as a
subcommand.  The user can still run "git difftool", and when the external
tool can compare two directories, the code to populate temporary directory
(and to set the trap to clean after itself) taken from this tool will run
inside "git difftool" frontend and then the external tool to compare the
two directories is spawned.

I also think that in a yet longer term, if this mode of "instantiate two
directories to be compared, and let the external tool do the comparison"
proves useful, almost all the "interesting" work done in this script
should be made unnecessary by adding an updated "external diff interface"
on the core side, so that nobody has to hurt his brain to implement an
error-prone command line parsing logic.

In other words, this statement cannot stay true:

> +This script is compatible with all the forms used to specify a
> +range of revisions to diff:

without updating the script every time underlying "git diff" gains new way
of comparing things.  If we move the "prepare two directories, and point
an external tool at them" logic to the core, we do not have to worry about
it at all.

Besides, I do not think the script covers all the forms; "git diff -R"
support is totally missing.

But that is all two steps in the future.

> +# mktemp is not available on all platforms (missing from msysgit)
> +# Use a hard-coded tmp dir if it is not available
> +tmp="$(mktemp -d -t tmp.XXXXXX 2>/dev/null)" || {
> +	tmp=/tmp/git-diffall-tmp
> +}

It would not withstand malicious attacks, but doing

	tmp=/tmp/git-diffall-tmp.$$

would at least protect you from accidental name crashes better in the
fallback codepath.

> +
> +trap 'rm -rf "$tmp" 2>/dev/null' EXIT

Do you need to suppress errors, especially when you are running "rm -rf"
with the 'f' option?

> +mkdir -p "$tmp"
> +
> +left=
> +right=
> +paths=
> +path_sep=
> +compare_staged=
> +common_ancestor=
> +left_dir=
> +right_dir=
> +diff_tool=
> +copy_back=

You can write multiple assignment on a line to save vertical space if you
want to, and the initialization sequence like this is a good place to do
so.

> +while test $# != 0
> +do
> +	case "$1" in
> +	-h|--h|--he|--hel|--help)
> +		usage
> +		;;
> +	--cached)
> +		compare_staged=1
> +		;;
> +	--copy-back)
> +		copy_back=1
> +		;;
> +	-x|--e|--ex|--ext|--extc|--extcm|--extcmd)
> +		diff_tool=$2
> +		shift
> +		;;

What if your command line ends with -x without $2?

Don't you want to match "-t/--tool" that "difftool" already uses?

> +	--)
> +		path_sep=1
> +		;;
> +	-*)
> +		echo Invalid option: "$1"
> +		usage
> +		;;
> +	*)
> +		# could be commit, commit range or path limiter
> +		case "$1" in
> +		*...*)
> +			left=${1%...*}
> +			right=${1#*...}
> +			common_ancestor=1
> +			;;

Strictly speaking, that is not just a common_ancestor but is a merge_base,
which is a common ancestor none of whose children is a common ancestor.

> +		*..*)
> +			left=${1%..*}
> +			right=${1#*..}
> +			;;
> +		*)
> +			if test -n "$path_sep"
> +			then
> +				paths="$paths$1 "
> +			elif test -z "$left"
> +			then
> +				left=$1
> +			elif test -z "$right"
> +			then
> +				right=$1
> +			else
> +				paths="$paths$1 "
> +			fi
> +			;;
> +		esac

Hrm, so "diffall HEAD~2 Documentation/" is not the way to compare the
contents of the Documentation/ directory between the named commit and
the working tree, like "diff HEAD~2 Documentation/" does.

That is not a show-stopper (a double-dash is an easy workaround), but
it is worth pointing out.

> +		;;
> +	esac
> +	shift
> +done
> +
> +# Determine the set of files which changed
> +if test -n "$left" && test -n "$right"
> +then
> +	left_dir="cmt-$(git rev-parse --short $left)"
> +	right_dir="cmt-$(git rev-parse --short $right)"
> +
> +	if test -n "$compare_staged"
> +	then
> +		usage
> +	elif test -n "$common_ancestor"
> +	then
> +		git diff --name-only "$left"..."$right" -- $paths > "$tmp/filelist"
> +	else
> +		git diff --name-only "$left" "$right" -- $paths > "$tmp/filelist"
> +	fi

And this will not work with pathspec that have $IFS characters.  If we
really wanted to we could do that by properly quoting "$1" when you build
$paths and then use eval when you run "git diff" here (look for 'sq' and
'eval' in existing scripts, e.g. "git-am.sh", if you are interested).

Also you may want to write filelist using -z format to protect yourself
from paths that contain LF, but that would require the loop "while read
name" to be rewritten.

> +# Exit immediately if there are no diffs
> +if test ! -s "$tmp/filelist"
> +then
> +	exit 0
> +fi

Ok, you have trap set already so $tmp will disappear with this exit ;-)

> +if test -n "$copy_back" && test "$right_dir" != "working_tree"
> +then
> +	echo "--copy-back is only valid when diff includes the working tree."
> +	exit 1
> +fi

I actually wondered why $right_dir needs to be populated with a copy in
the first place (if you do not copy but give the working tree itself to
the external tool, you do not even have to copy back).

I know the answer to the question, namely, "because the external tool
thinks files that are not in $left_dir are added files", but if you can
find a way to tell the external tool to ignore new files (similar to how
"diff -r" without -N works), running the tool with temporary left_dir and
the true workdir as right_dir would be a lot cleaner solution to the
problem.

> +# Create the named tmp directories that will hold the files to be compared
> +mkdir -p "$tmp/$left_dir" "$tmp/$right_dir"
> +
> +# Populate the tmp/right_dir directory with the files to be compared
> +if test -n "$right"
> +then
> +	while read name
> +	do
> +		ls_list=$(git ls-tree $right $name)
> +		if test -n "$ls_list"
> +		then
> +			mkdir -p "$tmp/$right_dir/$(dirname "$name")"
> +			git show "$right":"$name" > "$tmp/$right_dir/$name" || true
> +		fi
> +	done < "$tmp/filelist"

"while read -r name" might make this slightly more robust; even though
this loses leading and trailing whitespaces in filenames, we probably
can get away without worrying about them.

> +else
> +	# Mac users have gnutar rather than tar
> +	(tar --ignore-failed-read -c -T "$tmp/filelist" | (cd "$tmp/$right_dir" && tar -x)) || {
> +		gnutar --ignore-failed-read -c -T "$tmp/filelist" | (cd "$tmp/$right_dir" && gnutar -x)
> +	}

What is this "--ignore-failed-read" about?  Not reporting unreadable as an
error smells really bad.

If you require GNUism in your tar usage, this should be made configurable
so that people can use alternative names (some systems come with "tar"
that is POSIX and "gtar" that is GNU).

> +cd "$tmp"
> +LOCAL="$left_dir"
> +REMOTE="$right_dir"
> +
> +if test -n "$diff_tool"
> +then
> +	export BASE
> +	eval $diff_tool '"$LOCAL"' '"$REMOTE"'
> +else
> +	run_merge_tool "$merge_tool" false
> +fi
> +
> +# Copy files back to the working dir, if requested
> +if test -n "$copy_back" && test "$right_dir" = "working_tree"
> +then
> +	cd "$start_dir"
> +	git_top_dir=$(git rev-parse --show-toplevel)
> +	find "$tmp/$right_dir" -type f |
> +	while read file
> +	do
> +		cp "$file" "$git_top_dir/${file#$tmp/$right_dir/}"
> +	done
> +fi

This will copy new files created in $right_dir.  Is that intended?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html