Re: [PATCH v2] Perform cheaper connectivity check when pack is used as medium

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nguyễn Thái Ngọc Duy  <pclouds@xxxxxxxxx> writes:

> When we fetch or push, usually "git rev-list --verify-objects --not
> --all --stdin" is used to make sure that there are no gaps between
> existing refs and new refs. --verify-objects calls parse_object(),
> which internally calls check_sha1_signature() to verify object content
> matches its SHA-1 signature.
>
> check_sha1_signature() is an expensive operation, especially when new
> refs are far away from existing ones because all objects in between
> are re-hashed. However, if we receive new objects by pack, we can
> skip the operation because packs themselves do not contain SHA-1
> signatures. All signatures are recreated by unpack-objects/index-pack's
> hashing objects in the pack, which we can trust.
>
> Detect pack transfer cases and turn --verify-objects to --objects.

After thinking more about this patch, I do not think this "optimization"
is correct.

Consider a case where you have this history


    ---T       o---o---o

where 'T' is the tip of your ref (everything reachable from it is known to
be good), and 'o' are "isolated islands" commits that somehow exist in
your repository but are not complete for whatever reason.

The global history may look like this, where 'X' is the tip the other end
advertised, and '.' are commits that are new to your repository.

    ---T---.---o---o---o---.---X

Upon seeing 'X' advertised and noticing 'T' is the current tip, you would
ask for everything that is needed, i.e. "rev-list --objects T..X", to be
sent to you.

But the other end could send all the trees and commits for 'X' and '.' but
nothing for 'o'.  You will end up with a corrupt history but the loosened
"rev-list --objects T..X" you run after the object transfer will not catch
it.  You need --verify-objects if you want to catch this mode of attack.

Admittedly, in order to mount such an attack successfully, the attacker
needs to know what "isolated islands" you happen to have in the receiving
repository, which makes it much harder than a simpler attack (e.g. sending
only X but nothing else) that would have worked before we introduced the
connectivity check after object transfer.

But we need to realize that the reasoning expressed in your log message
"we received new objects by pack, so everything must validate fine" is not
valid, and we are loosening ourselves to new attacks when we evaluate this
patch.  This is because the completion of the history may be using objects
that did not come from the other side (i.e. commits 'o' and all the trees
necessary for them, but their blobs may be corrupt).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]