Re: [PATCH] {fetch,receive}-pack: drop unpack-objects, delay loosing objects until the end

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 03, 2013 at 06:56:23PM +0700, Nguyen Thai Ngoc Duy wrote:

> > ...but the cost is paid by total pack size, not number of objects. So if
> > I am pushing up a commit with a large uncompressible blob, I've
> > effectively doubled my disk I/O. It would make more sense to me for
> > index-pack to learn command-line options specifying the limits, and then
> > to operate on the pack as it streams in. E.g., to decide after seeing
> > the header to unpack rather than index, or to drop large blobs from the
> > pack (and put them in their own pack directly) as we are streaming into
> > it (we do not know the blob size ahead of time, but we can make a good
> > guess if it has a large on-disk size in the pack).
> 
> Yeah letting index-pack do the work was my backup plan :) I think if
> there is a big blob in the pack, then the pack should not be unpacked
> at all. If you store big blobs in a separate pack you already pay the
> the lookup cost of one more pack in find_pack_entry(), why go through
> the process of unpacking? index-pack still has the advantage of
> streaming though. Will rework.

In general, our large-blob strategy is to push them out to their own
pack so that we do not incur the I/O overhead of rewriting them whenever
we repack. But the flipside is that we have to pay the cost of an extra
.idx open and lookup for each such object. In the longer term, I think
it might make sense to be able to generate a multi-pack .idx for such a
case (or even to simply store the large blobs in a special area indexed
by the object sha1, as we do for loose objects).

But that is all orthogonal to your patch. I think as long as we are
moving towards "index-pack makes the decisions while it processes the
pack" we are going in a good direction. Even if we do not implement all
of the decisions immediately, it leaves room for doing so later without
loss of efficiency.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]