Re: Rename detection within in files WAS: [PATCH 2/6] t7408: merge short tests, factor out testing method

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 06.08.2016 um 01:26 schrieb Stefan Beller:
When moving code around, we usually get large chunks of text. If the contributor
is not 100% trustworthy, we need to review all the code without much intelectual
joy. Essentially the reviewer is just making sure the parts of the text are the
same.

I'd like to propose a new addition to the diff format that makes this use case
easier. The idea is to mark up lines that were just moved around in the file
instead of adding and removing them.

Currently we have 3 characters that
are allowed to start a line within a hunk:
' ' to indicate context
'+' to add a line
'-' to remove a line

I'd propose to add the following characters:
'*' which is the same as '+', but it indicates that the line was moved
     from somewhere else without change.
'X' The same as '-', with the addition that this line was moved to a different
     place without change.

The patch below uses these new '*' and 'X'. Each hunk that makes use of these
additions, is followed other sections, [moved-from, moved-to] that indicate
where the corresponding line is.

Interesting idea. It should be easy to convert the result into a regular unified diff for consumption with patch(1) or git am/apply by replacing the new flags with + and - and removing the moved-* hunks.

Your example ignores whitespace changes at the start of the line and within it, the added "-C $working_dir", s/expected/expect/; is this all intended? Only a single blank line was moved verbatim.

The moved-from and moved-to hunks make this diff quite verbose.

If multiple lines from different sources are moved to the same hunk then you'd get multiple moved-from hunks following that single destination, right? (Same with lines moved from a single hunk to multiple destinations and moved-to.)

But does it even warrent a new format? It's a display problem; the necessary information is already in the diffs we have today. A graphical diff viewer could connect moved blocks with lines, like http://www.araxis.com/merge/ does in its side-by-side view. A Thunderbird extension (or a bookmarklet or browser extendiion for webmail users) could do that for an email-based workflow.

Still, what about adding information about moved lines as an extended header (like that index line)? Line numbers are included in hunk headers and can serve as orientation. A reader would have to do some mental arithmetic (ugh), but incompatible format changes would be avoided. For your example it should look something like this:

	move from t/t7408-submodule-reference.sh:52,1
	move to t/t7408-submodule-reference.sh:22,1


There are multiple things to tackle when going for such an addition:
* How to present this to the user (it's covered in this email)
* how to find the renamed lines algorithmically.
   (there are already approaches to that, e.g. https://github.com/stefanbeller/duplo
   which is http://duplo.sourceforge.net/ with no substantial additions)

Any comments welcome,

Thanks,
Stefan

---
  t/t7408-submodule-reference.sh | 50 +++++++++++++++---------------------------
  1 file changed, 15 insertions(+), 29 deletions(-), 6 moved lines

diff --git a/t/t7408-submodule-reference.sh b/t/t7408-submodule-reference.sh
index afcc629..1416cbd 100755
--- a/t/t7408-submodule-reference.sh
+++ b/t/t7408-submodule-reference.sh
@@ -10,6 +10,16 @@ base_dir=$(pwd)

  U=$base_dir/UPLOAD_LOG

+test_alternate_usage()
+{
+	alternates_file=$1
+	working_dir=$2
+	test_line_count = 1 $alternates_file &&
*	echo "0 objects, 0 kilobytes" >expect &&
*	git -C $working_dir count-objects >current &&
*	diff expect current
+}
+

Post-image line 22.

  test_expect_success 'preparing first repository' '
  	test_create_repo A &&
  	(
@@ move-source 42,6 @@ test_expect_success 'that reference gets used with add' '
  test_expect_success 'that reference gets used with add' '
  	(
  		cd super/sub &&
X		echo "0 objects, 0 kilobytes" > expected &&
X		git count-objects > current &&
X		diff expected current
  	)
  '
@@ -42,44 +52,20 @@ test_expect_success 'preparing superproject' '
  	)
  '

-test_expect_success 'submodule add --reference' '
+test_expect_success 'submodule add --reference uses alternates' '
  	(
  		cd super &&
  		git submodule add --reference ../B "file://$base_dir/A" sub &&
  		git commit -m B-super-added
-	)
-'
-

Pre-image line 52.

-test_expect_success 'after add: existence of info/alternates' '
-	test_line_count = 1 super/.git/modules/sub/objects/info/alternates
-'
-
-test_expect_success 'that reference gets used with add' '
-	(
-		cd super/sub &&
X		echo "0 objects, 0 kilobytes" > expected &&
X		git count-objects > current &&
X		diff expected current
-	)
-'
-
-test_expect_success 'cloning superproject' '
-	git clone super super-clone
-'
-
@@ move-to 10,6 @@ test_alternate_usage
+	alternates_file=$1
+	working_dir=$2
+	test_line_count = 1 $alternates_file &&
*	echo "0 objects, 0 kilobytes" >expect &&
*	git -C $working_dir count-objects >current &&
*	diff expect current
+}
+
--
2.9.2.572.g9d9644e.dirty


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]