Re: [PATCH 1/1] builtin/pack-objects.c: avoid iterating all refs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 20 2021, Jeff King wrote:

> On Wed, Jan 20, 2021 at 09:50:19AM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> To elaborate on other things that aren't really your problem & Taylor's
>> E-Mail downthread we originally added this because:
>> 
>>     If new annotated tags have been introduced then we can also include
>>     them in the packfile, saving the client from needing to request them
>>     through a second connection.
>> 
>> I've barked up this whole tag fetch tree before 97716d217c (fetch: add a
>> --prune-tags option and fetch.pruneTags config, 2018-02-09) but really
>> not this specific area.
>> 
>> I wonder if longer term simply moving this to be work the client does
>> wouldn't make more sense. I.e. if we just delete this for_each_ref()
>> loop.
>>     
>> We're just saving a client from saying they "want" a tag. I.e. with the
>> whole thing removed we need this to make t5503-tagfollow.sh pass (see
>> [1] at the end for the dialog, the tag is 3253df4d...):
>
> Isn't this an ordering problem, though? The client cannot mention
> auto-followed tags in a "want", because they first need to "want" the
> main history, receive the pack, and then realize they want the others.
>
> So we are not just saving the client from sending a "want", but making a
> second full connection to do so. That seems to be an optimization worth
> continuing to have.

Correct. Suppose a history COMMIT(TAG) history like:

    mainline: A(X) -> B(Y) -> C
    topic:    A(X) -> B(Y) -> D(Z)

You only want the "mainline" history and its tags in your fetch. Now a
server will give you tags X & Y for free, ignoring Z.

The note in protocol-capabilities.txt supposes that you'll need to only
get A/B/C in one fetch, then do another one where you see from the OIDs
you have and advertised (peeled) OIDs for the tags and know you just
need X/Y, not Z.

So we could also just fetch Z, then do our own walk on the client to see
if it's reachable, and throw it away if not. Although I suppose that'll
need a list of "bad" tags on the client so we don't repeat the process
the next time around, hrm...

>>     diff --git a/t/t5503-tagfollow.sh b/t/t5503-tagfollow.sh
>>     index 6041a4dd32..1ddc430aef 100755
>>     --- a/t/t5503-tagfollow.sh
>>     +++ b/t/t5503-tagfollow.sh
>>     @@ -134,6 +134,7 @@ test_expect_success 'fetch B, S (commit and tag : 1 connection)' '
>>      test_expect_success 'setup expect' '
>>      cat - <<EOF >expect
>>      want $B
>>     +want $T
>>      want $S
>>      EOF
>>      '
>>     @@ -146,6 +147,7 @@ test_expect_success 'new clone fetch master and tags' '
>>                     cd clone2 &&
>>                     git init &&
>>                     git remote add origin .. &&
>>     +               git config --add remote.origin.fetch "+refs/tags/*:refs/tags/*" &&
>>                     GIT_TRACE_PACKET=$UPATH git fetch &&
>>                     test $B = $(git rev-parse --verify origin/master) &&
>>                     test $S = $(git rev-parse --verify tag2) &&
>> 
>> We're also saving the client the work of having to go through
>> refs/tags/* and figure out whether there are tags there that aren't on
>> our main history.
>
> You seem to be against auto-following at all. And certainly I can see an
> argument that it is not worth the trouble it causes. But it is the
> default behavior, and I suspect many people are relying on it. Fetching
> every tag indiscriminately is going to grab a bunch of extra unwanted
> objects in some repos. An obvious case is any time "clone
> --single-branch --depth" is used.

I wonder if the use-cases for --depth in the wild aren't 99.9%
--depth=1, in which case the peeled commit on the main branch being
advertised by a tag would alredy give you this information.

I.e. in the common case you don't get a tag, but if you happen to land
on a release you can see that the commit at the tip is tagged. I wonder
if doing that shortcut already in the client (so not send include-tag)
is a useful micro-optimization.

> Maybe I'm not quite understanding what you're proposing.

Not much really, just that looking deeper into this area might be a
useful exercise. I.e. it's a case of server<->client cooperation where
we offlod the work to one or the other based on an old design decision,
maybe that trade-off is not optimal in most cases anymore.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux