Re: sha-1 check in rev-list --verify-objects redundant?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 27, 2012 at 4:37 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes:
>
>> On the well-formedness, unless I'm mistaken, --verify-objects is
>> _always_ used in conjunction with index-pack.
>
> Hmm, you are making my head hurt.  Is the above "always" a typo of
> "never"?
>
> The static check_everything_connected() function in builtin/fetch.c is a
> direct callsite of "rev-list --verify-objects", and the function is used
> in two codepaths:
>
>  * store_updated_refs() that is used after we receive and store objects
>   from the other end.  We may or may not have run index-pack in this
>   codepath; in either case we need to make sure the other side did send
>   everything that is needed to complete the history between what we used
>   to have and what they claimed to supply us, to protect us from a broken
>   remote side.

I stand corrected. --verify-objects is _usually_ used in conjunction
with index-pack, when the media is a pack (i.e. no remote helpers)

>  * quickfetch() that is called even before we get any object from the
>   other end, to optimize the transfer when we already have what we need.
>
> The latter is the original use to protect against unconnected island of
> chain I explained in the previous message, but the former is also abot the
> same protection, in a different callchain.

I think we can trust what we already have, so in the latter case (and
the former when the medium is a pack), --objects should suffice.

> In both cases, the check by --verify-objects is about completeness of the
> history (is everything connected to the tips of refs we have?), and is
> different from integrity of individual objects (is each individual object
> well formed and hash correctly?).  Both kinds of sanity need to be
> checked, as they are orthogonal concepts.
>
> In order to check the history completeness, we need to read the objects
> that we walk during the check. I wouldn't be surprised if the codepath to
> do this is written overly defensive, taking a belt-and-suspender approach,
> and check the well-formedness of an object before it reads it to find out
> the other objects pointed by it.
>
> If we _know_ that we have checked the integrity of all the necessary
> individual objects before we start reading them in order to check the
> completeness of the history, there is an opportunity to optimize by
> teaching --verify-objects paths to optionally be looser than it currently
> is, to avoid checking the object integrity twice.

Ok, will cook something. The reason I raised it is because
--verify-objects --all on git.git could take ~1m10s, but if we assume
object integrity is fine and skip it, it could drop to 10s (I suspect
--objects gives the same number).
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]