On Tue, Sep 03, 2013 at 06:56:23PM +0700, Nguyen Thai Ngoc Duy wrote: > > ...but the cost is paid by total pack size, not number of objects. So if > > I am pushing up a commit with a large uncompressible blob, I've > > effectively doubled my disk I/O. It would make more sense to me for > > index-pack to learn command-line options specifying the limits, and then > > to operate on the pack as it streams in. E.g., to decide after seeing > > the header to unpack rather than index, or to drop large blobs from the > > pack (and put them in their own pack directly) as we are streaming into > > it (we do not know the blob size ahead of time, but we can make a good > > guess if it has a large on-disk size in the pack). > > Yeah letting index-pack do the work was my backup plan :) I think if > there is a big blob in the pack, then the pack should not be unpacked > at all. If you store big blobs in a separate pack you already pay the > the lookup cost of one more pack in find_pack_entry(), why go through > the process of unpacking? index-pack still has the advantage of > streaming though. Will rework. In general, our large-blob strategy is to push them out to their own pack so that we do not incur the I/O overhead of rewriting them whenever we repack. But the flipside is that we have to pay the cost of an extra .idx open and lookup for each such object. In the longer term, I think it might make sense to be able to generate a multi-pack .idx for such a case (or even to simply store the large blobs in a special area indexed by the object sha1, as we do for loose objects). But that is all orthogonal to your patch. I think as long as we are moving towards "index-pack makes the decisions while it processes the pack" we are going in a good direction. Even if we do not implement all of the decisions immediately, it leaves room for doing so later without loss of efficiency. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html