Re: [BUGREPORT] Why is git-push fetching content?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the replies. I'd like to bump this up again. This has come up
in a new context and I don't see a viable workaround for us that doesn't
involve a rewrite of the process and an excessive amount of new
infrastructure.

I have a feeling this is somehow a general issue with promisor remotes,
though I don't know enough about how they work to know where to start
investigation. I've got what I believe to be minimal reproduction steps
below.

Tao Klerks <tao@xxxxxxxxxx> writes:
> On Wed, Feb 22, 2023 at 4:45 PM Sean Allred <allred.sean@xxxxxxxxx> wrote:
>> "brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes:
>> > It's hard to know for certain what's going on here, but it depends on
>> > your history.  You did a partial clone with no trees, so you've likely
>> > received a single commit object and no trees or blobs.
>>
>> Yup, this was the intention behind `--depth=1 --filter=tree:0`. The
>> server doing this ref update needs to be faster than having the full
>> history would allow.
>>
>
> FWIW, you're not alone - we do exactly the same thing, for the same
> reasons, and get the same outcome: We want to create a tag in a CI
> job, that particular CI job has no reason to check out the code, all
> we know is we want ref XXXXX to point to commit YYYYY.
>
> [...]
>
> In our case it's still better than any alternative we've found, but
> wastes a few seconds that we'd love to see optimized away.

Unfortunately in our case, 'a few seconds' is tens of minutes (I'm
working with a repository of several million commits) and is timing out
the remote host.

----

I devised some minimal steps to reproduce what I believe to be a related
issue: rev-list fetching content. I've prepared a public repository on
github.com to demonstrate, but you should be able to recreate this
repository if needed by just making a handful of commits to a couple
arbitrary files.

    (cwd:tmp)
    $ git clone --no-checkout --depth=1 --no-tags --filter=tree:0 https://github.com/vermiculus/testibus.git
    Cloning into 'testibus'...
    remote: Enumerating objects: 1, done.
    remote: Counting objects: 100% (1/1), done.
    remote: Total 1 (delta 0), reused 1 (delta 0), pack-reused 0
    Receiving objects: 100% (1/1), done.

Sweet, I've only received one object from the remote. This makes sense
per what I want: a treeless, blobless, fetch of a single commit. Let's
double-check.

    (cwd:testibus)
    $ git fsck
    Checking object directories: 100% (256/256), done.
    Checking objects: 100% (2/2), done.

I have two objects? How'd that second one get in there? What is it?
Let's try to find out...

    (cwd:testibus)
    $ git rev-list --objects --all
    d86642e7ae089b69e8a0b20a3e39337435833f92

Alright, I've got the commit object. That makes sense.

    c0fa909c5f67047abc027d9b06e1352954ee33f7

Weird, I also got the tree on the commit, even though I specified that
this should be a treeless clone.

    remote: Enumerating objects: 1, done.
    remote: Counting objects: 100% (1/1), done.
    remote: Total 1 (delta 0), reused 1 (delta 0), pack-reused 0
    Receiving objects: 100% (1/1), 54 bytes | 54.00 KiB/s, done.
    94b334d80405218e281a6f5b48d31f73cd3af4be file

Woah woah! All I did was rev-list; why are we fetching content?

This is why I believe this is related to the push issue I'm ultimately
facing -- I'm not familiar with the specifics, but it stands to reason
that git-push needs to (somehow) iterate through objects in order to
negotiate a packfile with the remote. I suspect these two issues have
the same root cause.

I believe the following can be used with git-bisect to determine if this
truly ever worked or is a regression:

    setup:
        #!/bin/bash

        repo="https://github.com/vermiculus/testibus.git";
        repo_dir="~/path/to/repo"

        git clone --no-checkout --depth=1 --no-tags --filter=tree:0 "$repo" "$repo_dir"
        git -C "$repo_dir" remote set-url origin unreachable

    bisect script:
        git -C "$repo_dir" rev-list --objects --all

        (obviously using the just-built git)

I'm going to start running this bisect, but I suspect it will take a
while, so I wanted to get this out there.

--
Sean Allred




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux