Re: [PATCH v2 4/5] index-pack, unpack-objects: add --not-so-strict for connectivity check

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <gitster@xxxxxxxxx> writes:

> Duy Nguyen <pclouds@xxxxxxxxx> writes:
>
>> What do you mean by "partial history"? Do we have dangling pointers
>> after doing that commit walker?
>
> "^C" will leave the objects and it is safe because it will not
> update refs.
>
> But your code that does not verify the full connectivity from such
> an object (that is outside the transferred pack) to the rest of the
> history will then make the resulting repository broken, if you
> update your ref to point at such an object, no?  Ading a single
> has_sha1_file() only verifies that single object, not the history
> behind it.

Let's illustrate.  Imagine your project as a whole has this history:

---A---B---C---D---E

When you cloned, it only had up to A, so that is what you have.
Then you try http walker, which slurps E, wants to go to D and dig
down, but after fetching E, some trees and blobs in E, and fetching
D, but before completing D' trees and blobs, your ISP cuts you off.

You have these in your object store:

---A           D---E

but your ref is pointing at A, because we promise that we make sure
full connectivity before we update the ref, and even if you have
commits D and E, the associated trees, blobs, and commits behind
this subpart of the history are missing.

Now you try to fetch from another mirror over the pack transport.
You advertise that you have A (but you do not advertise E, because
your refs do not point at it), and you expect all objects that are
listed in "rev-list --objects A..E" to be gien to you.

Unfortunately, the mirror is broken, in that it only packs the
objects that appear in "rev-list --objects A..B", but still tells
you that it is sending objects to complete history leading to E.

Your object store would have objects like this after this transfer:

---A---B       D---E

You may still have commits D and E unpruned, but are missing C, and
trees and blobs that are needed in D or E.

You have to walk the history from E and list the necessary objects
yourself, running has_sha1_file() on all of them, to prove that you
have everything needed, and the only things you can trust are your
refs (in this case, A).

If you verify that all the objects in the transferred pack are
complete, and also verify that the object the transfer proposes to
update your ref to is _in_ that pack, then you can detect a mirror
that is broken and only sends objects for A..B, but that does not
actually solve the issue.  Imagine another broken mirror that
instead sends objects for E.  E, its trees and all its blobs may be
in the pack and the only outgoing link from that pack may be a
pointer out of E pointing at D.  You happen to _have_ it in your
object store, but that does not mean you can update your ref to E
(remember, you do not have trees and blobs to complete D, or the
history behind it).

The current check is implemented in the way we currently do it,
_not_ because we were stupid and did not realize the optimization
possibility (in other words, what your patch proposes is not new),
but because we rejected this approach because it was not safe.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]