Re: [PATCH 2/6] t7408: merge short tests, factor out testing method

Stefan Beller <sbeller@xxxxxxxxxx> · Fri, 5 Aug 2016 15:45:13 -0700

On Fri, Aug 5, 2016 at 1:45 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Stefan Beller <sbeller@xxxxxxxxxx> writes:
>
>> Tests consisting of one line each can be consolidated to have fewer tests
>> to run as well as fewer lines of code.
>>
>> When having just a few git commands, do not create a new shell but
>> use the -C flag in Git to execute in the correct directory.
>
> Good motivations.
>
>> Signed-off-by: Stefan Beller <sbeller@xxxxxxxxxx>
>> ---
>>  t/t7408-submodule-reference.sh | 50 +++++++++++++++---------------------------
>>  1 file changed, 18 insertions(+), 32 deletions(-)
>>
>> diff --git a/t/t7408-submodule-reference.sh b/t/t7408-submodule-reference.sh
>> index afcc629..1416cbd 100755
>> --- a/t/t7408-submodule-reference.sh
>> +++ b/t/t7408-submodule-reference.sh
>> @@ -10,6 +10,16 @@ base_dir=$(pwd)
>>
>>  U=$base_dir/UPLOAD_LOG
>
> Is this line needed anywhere?
>
> We (perhaps unfortunately) still need $base_dir because we want to
> give an absolute file:/// URL to "submodule add", but I do not think
> we use $U, so let's get rid of it.
>
>> +test_alternate_usage()
>> +{
>
> According to Documentation/CodingGuidelines, this should be:
>
>     test_alternate_usage () {
>
> Somehow the helper name sounds as if it is testing if an alternate
> is used correctly (i.e. the machinery may attempt to use alternate
> but not in a correct way), not testing if an alternate is correctly
> used (i.e. the machinery incorrectly forgets to use an alternate at
> all), but maybe it is just me.
>
>> +     alternates_file=$1
>> +     working_dir=$2
>
> These are good (they can be on a single line), but you would
> want &&-chain just as other lines.
>
>> +     test_line_count = 1 $alternates_file &&
>
> This needs to quote "$alternates_file" especially in a helper
> function you have no control over future callers of.
>
> I wonder if we want to check the actual contents of the alternate;
> it may force us to worry about the infamous "should we expect
> $(pwd)/something or $PWD/something" if we did so, so it is not a
> strong suggestion.
>
>> +     echo "0 objects, 0 kilobytes" >expect &&
>> +     git -C $working_dir count-objects >current &&
>> +     diff expect current
>
> It is more customary to name two "expect" vs "actual", and compare
> them using "test_cmp" not "diff".
>
>> +}
>> +
>>  test_expect_success 'preparing first repository' '
>>       test_create_repo A &&
>>       (
>> @@ -42,44 +52,20 @@ test_expect_success 'preparing superproject' '
>>       )
>>  '
>>
>> -test_expect_success 'submodule add --reference' '
>> +test_expect_success 'submodule add --reference uses alternates' '
>>       (
>>               cd super &&
>>               git submodule add --reference ../B "file://$base_dir/A" sub &&
>>               git commit -m B-super-added
>> -     )
>> -'
>> -
>> -test_expect_success 'after add: existence of info/alternates' '
>> -     test_line_count = 1 super/.git/modules/sub/objects/info/alternates
>> -'
>> -
>> -test_expect_success 'that reference gets used with add' '
>> -     (
>> -             cd super/sub &&
>> -             echo "0 objects, 0 kilobytes" > expected &&
>> -             git count-objects > current &&
>> -             diff expected current

This is where the "diff" and "current" above came from.

>> -     )
>> -'
>
> Completely unrelated tangent, but after seeing the "how would we
> make a more intelligent choice of the diff boundary" topic, I
> wondered if we can notice that at this point there is a logical
> boundary and do something intelligent about it.  All the removed
> lines above have become "test_alternate" we see below, while all the
> removed lines below upto "test_alternate" correspond to the updated
> test at the end.

I guess that would require even more knowledge of the underlying
content that we track in Git.

Originally I started the diff boundary topic, as I assumed that the
new line detection is not doing harm. What Michael came up with
is impressive though I fear there might be a selection bias in the corpus,
i.e. we are missing some projects that get worse by it and these projects
would have had a great influence on the selection of the tuning parameters.
I guess we'll just wait until someone speaks up and points at worse
examples there.

What you're asking here is a complete new ballpark IMHO
as it takes diff to a whole new (syntactical) level. As of now
the world agrees that '\n' seems to be a good character to
put in text documents, so we use it as a splitting token
in our underlying diff driver, i.e. the diffs are primarily by line,
no matter how many characters change in that line. Sometimes
there is a "secondary" aspect such as coloring and pointing out
the characters that changed in a line[1].

[1] random example:
https://gerrit-review.googlesource.com/#/c/79354/3/project.py
Visually we focus on the changed characters, but the underlying
diff is still "by line".

We could write another "diff driver" that would not segment the
text by '\n' as a primary method and then showing the changed
hunks with +/-, but it would try to find rearrangements in the
underlying text:

As an example, this patch with such a diff driver could read partially as:
----8<----
***diff-driver: moved-text line 42 -> 14, length:7, post-operations: 1
 test_expect_success 'that reference gets used with add' '
      (
              cd super/sub &&
*             echo "0 objects, 0 kilobytes" > expected &&
*             git count-objects > current &&
*             diff expected current
      )
***diff-driver:post-operation 1:
*** modify moved text with:

+test_alternate_usage() {
+        alternates_file="$1" &&
+        working_dir="$2" &&
*             echo "0 objects, 0 kilobytes" > expected &&
*             git count-objects > current &&
*             diff expected current
+ }

***diff-driver: deletion 40,50

 -     )
 -'
 -
 -test_expect_success 'after add: existence of info/alternates' '
 -     test_line_count = 1 super/.git/modules/sub/objects/info/alternates
 -'
 -
----8<----

Thinking about this further, this would not require knowledge of
the underlying content, it is actually "just" an intra-file-rename
detection.

We have a rename detection from one blob to another, so we
could also have a similar thing to deduplicate within one blob,
which would track moving code around.

>
>> -test_expect_success 'cloning superproject' '
>> -     git clone super super-clone
>> -'
>> -
>> -test_expect_success 'update with reference' '
>> -     cd super-clone && git submodule update --init --reference ../B
>> -'
>> -
>> -test_expect_success 'after update: existence of info/alternates' '
>> -     test_line_count = 1 super-clone/.git/modules/sub/objects/info/alternates
>> +     ) &&
>> +     test_alternate_usage super/.git/modules/sub/objects/info/alternates super/sub
>>  '
>>
>> -test_expect_success 'that reference gets used with update' '
>> -     cd super-clone/sub &&
>> -     echo "0 objects, 0 kilobytes" > expected &&
>> -     git count-objects > current &&
>> -     diff expected current
>> +test_expect_success 'updating superproject keeps alternates' '
>> +     test_when_finished "rm -rf super-clone" &&
>
> This one is new; we do not remove A, B or super.  Does it matter if
> we leave super-clone behind?  Is super-clone so special?

"We need it in a later test."

It comes down to philosophy of how to write tests.

I spent some time in 740* and this is a surprising short test.
Compare to 7404 and 7400 for example. These are very long,
so when you want to add another test (no matter if testing for a
fixed regression or a new feature), you have lots of repos like

    super, super2, super3, sub, sub1, sumodule, anothersub,

and none of the names make sense. (Can I reuse them?
do they have some weird corner case configuration
that I need to undo? etc)

To stop that from happening again I want to have a clean slate,
i.e. all clones are deleted shortly after using, so it is obvious what
to use for testing.

>
>> +     git clone super super-clone &&
>> +     git -C super-clone submodule update --init --reference ../B &&
>> +     test_alternate_usage super-clone/.git/modules/sub/objects/info/alternates super-clone/sub
>>  '
>>
>>  test_done
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html