Re: [PATCH v2] contrib: added git-diffall

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 23 Feb 2012 11:02:59 -0800

Tim Henigan <tim.henigan@xxxxxxxxx> writes:

> On Wed, Feb 22, 2012 at 6:48 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
>> Another thing that comes to my mind is if a user has an external tool that
>> can use "diffall", is there ever a situation where the user chooses to use
>> "difftool" instead, to go files one by one. I cannot offhand imagine any.
>
> It was my assumption that any tool that supports directory diff also
> supports individual file diff.  It seems like a strict subset of
> directory diff case.
> ...
>> Perhaps a longer term integration plan may be to lift the logic from this
>>
> ...snip...
>>
>> But that is all two steps in the future.
>
> I hope that this feature finds it way into the existing core commands.
> This script is intended to be a conversation starter that is also
> immediately useful as a separate command.  Would it be better to begin
> the long-term discussion now and skip adding this to contrib?

I would envision we have this in contrib/ first, without even fixing the
whitespace-in-pathspec and whitespace-or-lf-in-paths issues I pointed out
in my review, and let people play with it.

My crystal ball tells an optimist in me that we will see people (you and
others) try to fix issues they hit in their real life use cases, and the
script will be improved while it is still in contrib/.  And then somebody
who has worked on difftool will step up and roll it into difftool proper,
along the lines I hinted in the message you are responding to, at which
point the script will be removed from contrib/.  That is the first step in
the future.

The second step in the future may or may not come.  It will involve adding
an updated external diff interface on the core side and would prepare the
two temporary directories before the core side calls the difftool among
other things, and when that happens, we can lose most of the code in this
script that the first step in the future may have ported into difftool.

>> You can write multiple assignment on a line to save vertical space if you
>> want to, and the initialization sequence like this is a good place to do
>> so.
> My personal preference is to keep them on separate lines.  However if
> the compressed style is preferred, I will change it.

"I wouldn't bother" was what I meant by "if you want to".

>>> +     -x|--e|--ex|--ext|--extc|--extcm|--extcmd)
> ...
>> Don't you want to match "-t/--tool" that "difftool" already uses?
>
> Are you suggesting that I a) change "-x/--extcmd" to "-t/--tool" or
> that b) I add support for the "difftool -t/--tool" option?

No, I just misread the set of options "difftool" takes, without realizing
that --extcmd and --tool are two different things, and it is correct to
call your option "--extcmd" if it specifies what corresponds to what
"difftool --extcmd" specifies.  Sorry for the confusion.

>> Hrm, so "diffall HEAD~2 Documentation/" is not the way to compare the
>> contents of the Documentation/ directory between the named commit and
>> the working tree, like "diff HEAD~2 Documentation/" does.
>>
>> That is not a show-stopper (a double-dash is an easy workaround), but
>> it is worth pointing out.
>
> So I would need something to determine if a string represents a
> commit/tag/branch or a path?.

"It does not have to be fixed in the first version" (aka "for future
consideration") was what I meant by "not a show-stopper".

>> And this will not work with pathspec that have $IFS characters.  If we
>> really wanted to we could do that by properly quoting "$1" when you build
>> $paths and then use eval when you run "git diff" here (look for 'sq' and
>> 'eval' in existing scripts, e.g. "git-am.sh", if you are interested).
>>
>> Also you may want to write filelist using -z format to protect yourself
>> from paths that contain LF, but that would require the loop "while read
>> name" to be rewritten.
>
> I just discovered that the script fails to handle files that have
> spaces in their name, so some further work is needed.

Again, "It does not have to be fixed in the first version" (aka "for
future consideration") was what I meant by "If we really wanted to".

>> What is this "--ignore-failed-read" about? Not reporting unreadable as an
>> error smells really bad.
>
> If a file was added or deleted between the two commits being compared,
> tar would fail because a file was missing from "$tmp/filelist".  The
> "--ignore-failed-read" prevents tar from halting the script in this
> case.

But it will also ignore errors coming from other causes, no?  Wouldn't we
rather see an error if tar fails to read from a path that *has to* exist
in the working tree because "diff" said it does?

Again, it is just "for future consideration".

>> If you require GNUism in your tar usage, this should be made configurable
>> so that people can use alternative names (some systems come with "tar"
>> that is POSIX and "gtar" that is GNU).
>
> Is there an example showing how this could be configurable?  The
> problem is that the "--ignore-failed-read" was not supported in all
> flavors of tar.

Grep for "$TAR" and also @@DIFF@@ in the Makefile, and add substitution
for @@TAR@@ in cmd_munge_script, perhaps?

By the way, I actually have an even more radical suggestion that may let
you get rid of most of the lines in your script.

If you tweak the usage so that "diffall" specific options *MUST* come
first, e.g.

	USAGE=[--copy-back] [-x <cmd>] <arguments for diff>

then you can parse your argument partially, i.e.

	copy_back= extcmd=
	while case "$#,$1" in 0,*) break ;; *,-*) ;; *) break ;; esac
        do
		case "$1" in
                --copy-back)
                	copy_back=true
			;;
                -x | --extcmd)
			test $# != 1 || usage
			extcmd=$2
                        shift
			;;
		*)
                	break
                        ;;
		esac
                shift
	done

then feed the remainder all to "diff", e.g.

	diff --raw --no-abbrev "$@" |

And then you can prepare two temporary index files and stuff the output in
them, by having something like this on the downstream of the pipe:

	while read -r lmode rmode lsha1 rsha1 status path
        do
        	if test "$lmode" != $null_mode
		then
                	GIT_INDEX_FILE=$tmp.left_index \
                        git update-index --add --cacheinfo $lmode $lsha1 $path
		fi
        	if test "$rmode" != $null_mode
		then
                	GIT_INDEX_FILE=$tmp.right_index \
                        git update-index --add --cacheinfo $rmode $rsha1 $path
		fi
	done

Side Note:
	In the production version, you would probably give the "-z" option
	to "diff", and write this loop in Perl so that you can cope better
	with funny characters in the path.  Instead of running two
	instances of "git update-index" for every path, the loop would
	group the entries for left and right side, and drive one instance
	of "git update-index --index-info" each to populate the two index
	files.

	Also the above needs to be adjusted to deal with the side that
	represents the working tree files; they are reported with $null_sha1
	so in such a case instead of putting it in the temporary index,
        you would copy the working tree file to the temporary location.

After you prepare these two temporary index files, you can then use them
to populate your left_dir and right_dir like this:

	GIT_DIR=$(git rev-parse --git-dir) \
	GIT_WORK_TREE=$left_dir \
        GIT_INDEX_FILE=$tmp.left_index \
        git checkout-index -a

With this, you do not have to worry about anything about the funny
combinations of where the two "directories" comes from when preparing
the temporary directories to be compared.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html