Re: [PATCH] {fetch,receive}-pack: drop unpack-objects, delay loosing objects until the end

Duy Nguyen <pclouds@xxxxxxxxx> · Tue, 3 Sep 2013 18:56:23 +0700



On Tue, Sep 3, 2013 at 1:49 PM, Jeff King <peff@xxxxxxxx> wrote:
>>  - by going through index-pack first, then unpack, we pay extra cost
>>    for completing a thin pack into a full one. But compared to fetch's
>>    total time, it should not be noticeable because unpack-objects is
>>    only called when the pack contains a small number of objects.
>
> ...but the cost is paid by total pack size, not number of objects. So if
> I am pushing up a commit with a large uncompressible blob, I've
> effectively doubled my disk I/O. It would make more sense to me for
> index-pack to learn command-line options specifying the limits, and then
> to operate on the pack as it streams in. E.g., to decide after seeing
> the header to unpack rather than index, or to drop large blobs from the
> pack (and put them in their own pack directly) as we are streaming into
> it (we do not know the blob size ahead of time, but we can make a good
> guess if it has a large on-disk size in the pack).

Yeah letting index-pack do the work was my backup plan :) I think if
there is a big blob in the pack, then the pack should not be unpacked
at all. If you store big blobs in a separate pack you already pay the
the lookup cost of one more pack in find_pack_entry(), why go through
the process of unpacking? index-pack still has the advantage of
streaming though. Will rework.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html