Re: [PATCH v2] Perform cheaper connectivity check when pack is used as medium

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 02 Mar 2012 22:59:19 -0800

Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes:

> It won't help the unpack-objects case. But unpack-objects is only used
> when the pack has less than a certain number of objects, doing heavy
> check in that case should not take too long. Yes, I was thinking I
> would pass pack identity down the verify-pack callchain for index-pack
> case.

Yes, I think we are on the same track; see below.

>> I also suspect that more than trivial amount of computation is needed to
>> determine if a given object exists only in a single pack, so the end
>> result may not be that much cheaper than the current --verify-object code.
>
> Objects can exist in multiple packs right now if they are base
> objects. I'm not sure why you need to check for object existence in a
> single pack.

What I meant to say was not "it is in this pack and nowhere else", but
about a check like this:

        static void finish_object(struct object *obj, ...)
        {
                struct packed_git *fetched_pack = cb_data->fetched_pack;

                if (obj->type == OBJ_BLOB && !has_sha1_file(obj->sha1))
                        die("missing");
                if (!info->revs->verify_objects)
                        return;
		if (find_pack_entry_one(obj->sha1, fetched_pack))
                        return; /* we just fetched and ran index-pack on it */
                if (!obj->parsed && obj->type != OBJ_COMMIT)
                        parse_object(obj->sha1);                
        }

I think this is the kind of "passing identity down the callchain" both of
us have in mind.  I was trying to say that find_pack_entry() may not be
trivially cheap.  But probably I am being worried too much.

But now you brought it up, I think we may also need to worry about a
corrupt pre-existing loose blob object.  In general, we tend to always
favor reading objects from packs over loose objects, but I do not know
offhand what repacking would do when there are two places it can read the
same object from (it should be allowed to pick whichever is easier to
read).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html