Re: [PATCH v2] contrib: added git-diffall

Tim Henigan <tim.henigan@xxxxxxxxx> · Thu, 23 Feb 2012 11:07:17 -0500

On Wed, Feb 22, 2012 at 6:48 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Tim Henigan <tim.henigan@xxxxxxxxx> writes:
>
> We encourage our log messages to describe the problem first and then
> present solution to the problem, so I would update the above perhaps like
> this:
>
>        The 'git difftool' command lets the user to use an external tool
>        to view diffs, but it runs the tool for one file at the time. This
>        makes it tedious to review a change that spans multiple files.
>
>        The "git-diffall" script instead prepares temporary directories
>        with preimage and postimage files, and launches a single instance
>        of an external diff tool to view the differences in them.
>        diff.tool or merge.tool configuration variable is used to specify
>        what external tool is used.

Understood.  I will update in v3.

> I am wondering if reusing "diff.tool" or "merge.tool" is a good idea,
> though.
>
> I guess that it is OK to assume that any external tool that can compare
> two directories MUST be able to compare two individual files, and if that
> is true, it is perfectly fine to reuse the configuration.  But if an
> external tool "frobdiff" that can compare two directories cannot compare
> two individual files, it will make it impossible for the user to run "git
> difftool" if diff.tool is set to "frobdiff" to use with "diffall".
>
> Another thing that comes to my mind is if a user has an external tool that
> can use "diffall", is there ever a situation where the user chooses to use
> "difftool" instead, to go files one by one.  I cannot offhand imagine any.

It was my assumption that any tool that supports directory diff also
supports individual file diff.  It seems like a strict subset of
directory diff case.

> Perhaps a longer term integration plan may be to lift the logic from this
>
...snip...
>
> But that is all two steps in the future.

I hope that this feature finds it way into the existing core commands.
 This script is intended to be a conversation starter that is also
immediately useful as a separate command.  Would it be better to begin
the long-term discussion now and skip adding this to contrib?

>> +# mktemp is not available on all platforms (missing from msysgit)
>> +# Use a hard-coded tmp dir if it is not available
>> +tmp="$(mktemp -d -t tmp.XXXXXX 2>/dev/null)" || {
>> +     tmp=/tmp/git-diffall-tmp
>> +}
>
> It would not withstand malicious attacks, but doing
>
>        tmp=/tmp/git-diffall-tmp.$$
>
> would at least protect you from accidental name crashes better in the
> fallback codepath.

This makes sense.  I will add a unique portion to the name of the tmp dir in v3.

>> +trap 'rm -rf "$tmp" 2>/dev/null' EXIT
>
> Do you need to suppress errors, especially when you are running "rm -rf"
> with the 'f' option?

On msysgit, I found that "rm -rf $tmp" consistently fails due with a
permission error.  I don't understand why the script that created the
tmp dir is not allowed to delete it.  I am still looking into it, but
so far it appears to be an idiosyncrasy of msysgit.

>> +left=
>> +right=
>> +paths=
>> +path_sep=
>> +compare_staged=
>> +common_ancestor=
>> +left_dir=
>> +right_dir=
>> +diff_tool=
>> +copy_back=
>
> You can write multiple assignment on a line to save vertical space if you
> want to, and the initialization sequence like this is a good place to do
> so.

My personal preference is to keep them on separate lines.  However if
the compressed style is preferred, I will change it.

>> +     -x|--e|--ex|--ext|--extc|--extcm|--extcmd)
>> +             diff_tool=$2
>> +             shift
>> +             ;;
>
> What if your command line ends with -x without $2?

Currently it results in a shift error with no useful message to the
user.  I will add something for this in v3.

> Don't you want to match "-t/--tool" that "difftool" already uses?

Are you suggesting that I a) change "-x/--extcmd" to "-t/--tool" or
that b) I add support for the "difftool -t/--tool" option?

If "a", I was reusing the "difftool --extcmd" option which has the
same behavior.  If "b", I will look into it.

>> +             # could be commit, commit range or path limiter
>> +             case "$1" in
>> +             *...*)
>> +                     left=${1%...*}
>> +                     right=${1#*...}
>> +                     common_ancestor=1
>> +                     ;;
>
> Strictly speaking, that is not just a common_ancestor but is a merge_base,
> which is a common ancestor none of whose children is a common ancestor.

Understood.  I will change the name to "merge_base" in v3.

>> +             *..*)
>> +                     left=${1%..*}
>> +                     right=${1#*..}
>> +                     ;;
>> +             *)
>> +                     if test -n "$path_sep"
>> +                     then
>> +                             paths="$paths$1 "
>> +                     elif test -z "$left"
>> +                     then
>> +                             left=$1
>> +                     elif test -z "$right"
>> +                     then
>> +                             right=$1
>> +                     else
>> +                             paths="$paths$1 "
>> +                     fi
>> +                     ;;
>> +             esac
>
> Hrm, so "diffall HEAD~2 Documentation/" is not the way to compare the
> contents of the Documentation/ directory between the named commit and
> the working tree, like "diff HEAD~2 Documentation/" does.
>
> That is not a show-stopper (a double-dash is an easy workaround), but
> it is worth pointing out.

So I would need something to determine if a string represents a
commit/tag/branch or a path?.  I presume it would need to handle the
corner case where a branch/tag and path have the same name.  Is there
anything like this in the mergetool lib scripts today?

>> +     then
>> +             git diff --name-only "$left"..."$right" -- $paths > "$tmp/filelist"
>> +     else
>> +             git diff --name-only "$left" "$right" -- $paths > "$tmp/filelist"
>> +     fi
>
> And this will not work with pathspec that have $IFS characters.  If we
> really wanted to we could do that by properly quoting "$1" when you build
> $paths and then use eval when you run "git diff" here (look for 'sq' and
> 'eval' in existing scripts, e.g. "git-am.sh", if you are interested).
>
> Also you may want to write filelist using -z format to protect yourself
> from paths that contain LF, but that would require the loop "while read
> name" to be rewritten.

I just discovered that the script fails to handle files that have
spaces in their name, so some further work is needed.

>> +# Exit immediately if there are no diffs
>> +if test ! -s "$tmp/filelist"
>> +then
>> +     exit 0
>> +fi
>
> Ok, you have trap set already so $tmp will disappear with this exit ;-)

I'll try not to be such a slow learner in the future...but no guarantees.

>> +if test -n "$copy_back" && test "$right_dir" != "working_tree"
>> +then
>> +     echo "--copy-back is only valid when diff includes the working tree."
>> +     exit 1
>> +fi
>
> I actually wondered why $right_dir needs to be populated with a copy in
> the first place (if you do not copy but give the working tree itself to
> the external tool, you do not even have to copy back).
>
> I know the answer to the question, namely, "because the external tool
> thinks files that are not in $left_dir are added files", but if you can
> find a way to tell the external tool to ignore new files (similar to how
> "diff -r" without -N works), running the tool with temporary left_dir and
> the true workdir as right_dir would be a lot cleaner solution to the
> problem.

I'll note this as "for future consideration".  I spent some time
trying this is the original implementation, but could not find a
workable solution in the time I had available.

>> +     while read name
>> +     do
>> +             ls_list=$(git ls-tree $right $name)
>> +             if test -n "$ls_list"
>> +             then
>> +                     mkdir -p "$tmp/$right_dir/$(dirname "$name")"
>> +                     git show "$right":"$name" > "$tmp/$right_dir/$name" || true
>> +             fi
>> +     done < "$tmp/filelist"
>
> "while read -r name" might make this slightly more robust; even though
> this loses leading and trailing whitespaces in filenames, we probably
> can get away without worrying about them.
>
>> +else
>> +     # Mac users have gnutar rather than tar
>> +     (tar --ignore-failed-read -c -T "$tmp/filelist" | (cd "$tmp/$right_dir" && tar -x)) || {
>> +             gnutar --ignore-failed-read -c -T "$tmp/filelist" | (cd "$tmp/$right_dir" && gnutar -x)
>> +     }
>
> What is this "--ignore-failed-read" about?  Not reporting unreadable as an
> error smells really bad.

If a file was added or deleted between the two commits being compared,
tar would fail because a file was missing from "$tmp/filelist".  The
"--ignore-failed-read" prevents tar from halting the script in this
case.

> If you require GNUism in your tar usage, this should be made configurable
> so that people can use alternative names (some systems come with "tar"
> that is POSIX and "gtar" that is GNU).

Is there an example showing how this could be configurable?  The
problem is that the "--ignore-failed-read" was not supported in all
flavors of tar.

>> +# Copy files back to the working dir, if requested
>> +if test -n "$copy_back" && test "$right_dir" = "working_tree"
>> +then
>> +     cd "$start_dir"
>> +     git_top_dir=$(git rev-parse --show-toplevel)
>> +     find "$tmp/$right_dir" -type f |
>> +     while read file
>> +     do
>> +             cp "$file" "$git_top_dir/${file#$tmp/$right_dir/}"
>> +     done
>> +fi
>
> This will copy new files created in $right_dir.  Is that intended?

hmmm...that was not intended.  If would be odd for the user to create
new files in this tmp directory, but if the diff tool automatically
generates any files then this could result in unwanted files.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html