On Fri, Aug 5, 2016 at 1:45 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Stefan Beller <sbeller@xxxxxxxxxx> writes: > >> Tests consisting of one line each can be consolidated to have fewer tests >> to run as well as fewer lines of code. >> >> When having just a few git commands, do not create a new shell but >> use the -C flag in Git to execute in the correct directory. > > Good motivations. > >> Signed-off-by: Stefan Beller <sbeller@xxxxxxxxxx> >> --- >> t/t7408-submodule-reference.sh | 50 +++++++++++++++--------------------------- >> 1 file changed, 18 insertions(+), 32 deletions(-) >> >> diff --git a/t/t7408-submodule-reference.sh b/t/t7408-submodule-reference.sh >> index afcc629..1416cbd 100755 >> --- a/t/t7408-submodule-reference.sh >> +++ b/t/t7408-submodule-reference.sh >> @@ -10,6 +10,16 @@ base_dir=$(pwd) >> >> U=$base_dir/UPLOAD_LOG > > Is this line needed anywhere? > > We (perhaps unfortunately) still need $base_dir because we want to > give an absolute file:/// URL to "submodule add", but I do not think > we use $U, so let's get rid of it. > >> +test_alternate_usage() >> +{ > > According to Documentation/CodingGuidelines, this should be: > > test_alternate_usage () { > > Somehow the helper name sounds as if it is testing if an alternate > is used correctly (i.e. the machinery may attempt to use alternate > but not in a correct way), not testing if an alternate is correctly > used (i.e. the machinery incorrectly forgets to use an alternate at > all), but maybe it is just me. > >> + alternates_file=$1 >> + working_dir=$2 > > These are good (they can be on a single line), but you would > want &&-chain just as other lines. > >> + test_line_count = 1 $alternates_file && > > This needs to quote "$alternates_file" especially in a helper > function you have no control over future callers of. > > I wonder if we want to check the actual contents of the alternate; > it may force us to worry about the infamous "should we expect > $(pwd)/something or $PWD/something" if we did so, so it is not a > strong suggestion. > >> + echo "0 objects, 0 kilobytes" >expect && >> + git -C $working_dir count-objects >current && >> + diff expect current > > It is more customary to name two "expect" vs "actual", and compare > them using "test_cmp" not "diff". > >> +} >> + >> test_expect_success 'preparing first repository' ' >> test_create_repo A && >> ( >> @@ -42,44 +52,20 @@ test_expect_success 'preparing superproject' ' >> ) >> ' >> >> -test_expect_success 'submodule add --reference' ' >> +test_expect_success 'submodule add --reference uses alternates' ' >> ( >> cd super && >> git submodule add --reference ../B "file://$base_dir/A" sub && >> git commit -m B-super-added >> - ) >> -' >> - >> -test_expect_success 'after add: existence of info/alternates' ' >> - test_line_count = 1 super/.git/modules/sub/objects/info/alternates >> -' >> - >> -test_expect_success 'that reference gets used with add' ' >> - ( >> - cd super/sub && >> - echo "0 objects, 0 kilobytes" > expected && >> - git count-objects > current && >> - diff expected current This is where the "diff" and "current" above came from. >> - ) >> -' > > Completely unrelated tangent, but after seeing the "how would we > make a more intelligent choice of the diff boundary" topic, I > wondered if we can notice that at this point there is a logical > boundary and do something intelligent about it. All the removed > lines above have become "test_alternate" we see below, while all the > removed lines below upto "test_alternate" correspond to the updated > test at the end. I guess that would require even more knowledge of the underlying content that we track in Git. Originally I started the diff boundary topic, as I assumed that the new line detection is not doing harm. What Michael came up with is impressive though I fear there might be a selection bias in the corpus, i.e. we are missing some projects that get worse by it and these projects would have had a great influence on the selection of the tuning parameters. I guess we'll just wait until someone speaks up and points at worse examples there. What you're asking here is a complete new ballpark IMHO as it takes diff to a whole new (syntactical) level. As of now the world agrees that '\n' seems to be a good character to put in text documents, so we use it as a splitting token in our underlying diff driver, i.e. the diffs are primarily by line, no matter how many characters change in that line. Sometimes there is a "secondary" aspect such as coloring and pointing out the characters that changed in a line[1]. [1] random example: https://gerrit-review.googlesource.com/#/c/79354/3/project.py Visually we focus on the changed characters, but the underlying diff is still "by line". We could write another "diff driver" that would not segment the text by '\n' as a primary method and then showing the changed hunks with +/-, but it would try to find rearrangements in the underlying text: As an example, this patch with such a diff driver could read partially as: ----8<---- ***diff-driver: moved-text line 42 -> 14, length:7, post-operations: 1 test_expect_success 'that reference gets used with add' ' ( cd super/sub && * echo "0 objects, 0 kilobytes" > expected && * git count-objects > current && * diff expected current ) ***diff-driver:post-operation 1: *** modify moved text with: +test_alternate_usage() { + alternates_file="$1" && + working_dir="$2" && * echo "0 objects, 0 kilobytes" > expected && * git count-objects > current && * diff expected current + } ***diff-driver: deletion 40,50 - ) -' - -test_expect_success 'after add: existence of info/alternates' ' - test_line_count = 1 super/.git/modules/sub/objects/info/alternates -' - ----8<---- Thinking about this further, this would not require knowledge of the underlying content, it is actually "just" an intra-file-rename detection. We have a rename detection from one blob to another, so we could also have a similar thing to deduplicate within one blob, which would track moving code around. > >> -test_expect_success 'cloning superproject' ' >> - git clone super super-clone >> -' >> - >> -test_expect_success 'update with reference' ' >> - cd super-clone && git submodule update --init --reference ../B >> -' >> - >> -test_expect_success 'after update: existence of info/alternates' ' >> - test_line_count = 1 super-clone/.git/modules/sub/objects/info/alternates >> + ) && >> + test_alternate_usage super/.git/modules/sub/objects/info/alternates super/sub >> ' >> >> -test_expect_success 'that reference gets used with update' ' >> - cd super-clone/sub && >> - echo "0 objects, 0 kilobytes" > expected && >> - git count-objects > current && >> - diff expected current >> +test_expect_success 'updating superproject keeps alternates' ' >> + test_when_finished "rm -rf super-clone" && > > This one is new; we do not remove A, B or super. Does it matter if > we leave super-clone behind? Is super-clone so special? "We need it in a later test." It comes down to philosophy of how to write tests. I spent some time in 740* and this is a surprising short test. Compare to 7404 and 7400 for example. These are very long, so when you want to add another test (no matter if testing for a fixed regression or a new feature), you have lots of repos like super, super2, super3, sub, sub1, sumodule, anothersub, and none of the names make sense. (Can I reuse them? do they have some weird corner case configuration that I need to undo? etc) To stop that from happening again I want to have a clean slate, i.e. all clones are deleted shortly after using, so it is obvious what to use for testing. > >> + git clone super super-clone && >> + git -C super-clone submodule update --init --reference ../B && >> + test_alternate_usage super-clone/.git/modules/sub/objects/info/alternates super-clone/sub >> ' >> >> test_done -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html