On Tue, Sep 3, 2013 at 1:49 PM, Jeff King <peff@xxxxxxxx> wrote: >> - by going through index-pack first, then unpack, we pay extra cost >> for completing a thin pack into a full one. But compared to fetch's >> total time, it should not be noticeable because unpack-objects is >> only called when the pack contains a small number of objects. > > ...but the cost is paid by total pack size, not number of objects. So if > I am pushing up a commit with a large uncompressible blob, I've > effectively doubled my disk I/O. It would make more sense to me for > index-pack to learn command-line options specifying the limits, and then > to operate on the pack as it streams in. E.g., to decide after seeing > the header to unpack rather than index, or to drop large blobs from the > pack (and put them in their own pack directly) as we are streaming into > it (we do not know the blob size ahead of time, but we can make a good > guess if it has a large on-disk size in the pack). Yeah letting index-pack do the work was my backup plan :) I think if there is a big blob in the pack, then the pack should not be unpacked at all. If you store big blobs in a separate pack you already pay the the lookup cost of one more pack in find_pack_entry(), why go through the process of unpacking? index-pack still has the advantage of streaming though. Will rework. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html