Re: [PATCH v3 02/11] bundle: verify using check_connected()

Derrick Stolee <derrickstolee@xxxxxxxxxx> · Tue, 31 Jan 2023 14:31:08 -0500

On 1/31/2023 12:35 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

>> +	/* TODO: preserve this verbose language. */
> 
> I am lost -- aren't we preserving the BUNDLE_VERBOSE code below?

Sorry, I put this in so I wouldn't forget to test that the
verbose message sticks, but I did forget to remove it.

>> diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
>> index 38dbbf89155..7d40994991e 100755
>> --- a/t/t6020-bundle-misc.sh
>> +++ b/t/t6020-bundle-misc.sh
>> @@ -595,14 +595,14 @@ test_expect_success 'verify catches unreachable, broken prerequisites' '
>>  		# Verify should fail
>>  		test_must_fail git bundle verify \
>>  			../clone-from/tip.bundle 2>err &&
>> -		grep "Could not read $BAD_OID" err &&
>> -		grep "Failed to traverse parents of commit $TIP_OID" err &&
>> +		grep "some prerequisite commits .* are not connected" err &&
>> +		test_line_count = 1 err &&
>>  
>>  		# Unbundling should fail
>>  		test_must_fail git bundle unbundle \
>>  			../clone-from/tip.bundle 2>err &&
>> -		grep "Could not read $BAD_OID" err &&
>> -		grep "Failed to traverse parents of commit $TIP_OID" err
>> +		grep "some prerequisite commits .* are not connected" err &&
>> +		test_line_count = 1 err
>>  	)
>>  '
> 
> Especially with the new test added in the previous step, we know we
> are not trading correctness off.  Excellent.
> 
> I wonder how much the performance hit does this version incur over
> the "not safe at all" version and over the "use custom and
> stricter-than-needed" version, by the way?

I was able to simulate this in an extreme situation by taking a clone
of the normal Git fork, creating a ref at every first parent, then
creating a bundle of the difference between git/git and gitster/git.
Finally, I compared the performance of 'git bundle verify' on Git
compiled before this change and after this change:

Benchmark 1: old
  Time (mean ± σ):     324.7 ms ±   7.5 ms    [User: 228.0 ms, System: 95.7 ms]
  Range (min … max):   315.3 ms … 338.0 ms    10 runs

Benchmark 2: new
  Time (mean ± σ):     331.2 ms ±  20.2 ms    [User: 230.6 ms, System: 99.7 ms]
  Range (min … max):   302.8 ms … 360.2 ms    10 runs

Summary
  'old' ran
    1.02 ± 0.07 times faster than 'new'

So, there is a performance difference in the two situations, but it
is very slight, in what I can detect.

Of course, there is the case where the behavior differs because of
the more permissible behavior, but I expect the old algorithm to be
slower than the new case, because check_connected() can terminate
with success (due to seeing all the prerequisite commits) faster
than the old walk can terminate with failure (due to walking all
reachable commits).

Thanks,
-Stolee