Funnies with "git fetch"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I just did this in an empty directory.

    $ git init src
    $ cd src
    $ echo hello >greetings ; git add . ; git commit -m greetings
    $ S=$(git rev-parse :greetings | sed -e 's|^..|&/|')
    $ X=$(echo bye | git hash-object -w --stdin | sed -e 's|^..|&/|')
    $ mv -f .git/objects/$X .git/objects/$S

The tip commit _thinks_ it has "greetings" that contains "hello", but
somebody replaced it with a corrupt "bye" that does not match self
integrity.

    $ git fsck
    error: sha1 mismatch ce013625030ba8dba906f756967f9e9ca394464a

    error: ce013625030ba8dba906f756967f9e9ca394464a: object corrupt or missing
    missing blob ce013625030ba8dba906f756967f9e9ca394464a

The "hello" blob is ce0136, and the tree contained in HEAD expects "hello"
in that loose object file, but notices the contents do not match the
filename.

So far, so good. Let's see what others see when they interact with this
repository.

cd ../
git init dst
cd dst
git config receive.fsckobjects true
git remote add origin ../src
git config branch.master.remote origin
git config branch.master.merge refs/heads/master
git fetch
    remote: Counting objects: 3, done.
    remote: Total 3 (delta 0), reused 0 (delta 0)
    Unpacking objects: 100% (3/3), done.
    From ../src
     * [new branch]      master     -> origin/master

Oops? If we run "fsck" at this point, we would notice the breakage:

    $ git fsck
    notice: HEAD points to an unborn branch (master)
    broken link from    tree 1c93b84c9756b083e5751db1f9ffa7f80ac667e2
                  to    blob ce013625030ba8dba906f756967f9e9ca394464a
    missing blob ce013625030ba8dba906f756967f9e9ca394464a
    dangling blob b023018cabc396e7692c70bbf5784a93d3f738ab

Here, b02301 is the true identity of the "bye" blob the src repository
crafted and tried to fool us into believing it is "hello".  We can see
that the object transfer gave three objects, and because we only propagate
the contents and have the receiving end compute the object names from the
data, we received b02301 but not ce0136.

    $ ls .git/objects/??/?*
    .git/objects/1c/93b84c9756b083e5751db1f9ffa7f80ac667e2
    .git/objects/61/5d8c76daef6744635c87fb312a76a5ec7462ea
    .git/objects/b0/23018cabc396e7692c70bbf5784a93d3f738ab

As a side note, if we did "git pull" instead of "git fetch", we would have
also noticed the breakage, like so:

    $ git pull
    remote: Counting objects: 3, done.
    remote: Total 3 (delta 0), reused 0 (delta 0)
    Unpacking objects: 100% (3/3), done.
    From ../src
     * [new branch]      master     -> origin/master
    error: unable to find ce013625030ba8dba906f756967f9e9ca394464a
    error: unable to read sha1 file of greetings (ce013625030ba...)

But the straight "fetch" did not notice anything fishy going on. Shouldn't
we have?  Even though we may be reasonably safe, unpack-objects should be
able to do better, especially under receive.fsckobjects option.

Also as a side note, if we set 

    $ git config fetch.unpacklimit 1

before we run this "git fetch", we end up storing a single pack, whose
contents are the same three objects above (as expected), and we do not get
any indication of an error from the command.

I think the breakages are:

 - The sending side does not give any indication that it _wanted_ to send
   ce0136 but couldn't, and ended up sending another object;

 - The pack data sent over the wire was self consistent (no breakage here)
   and sent three well-formed objects, but it was inconsistent with
   respect to what history was being transferred (breakage is here);

 - The receiving end did not notice the inconsistency.

The first one is of the lower priority, as the client side should be able
to notice an upstream with corruption in any case. Perhaps after asking
for objects between "have" and "want", "git fetch" should verify that it
can fully walk the subhistory that was supposed to be transferred down to
the blob level?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]