Re: [PATCH WIP 0/4] Special code path for large blobs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> wrote:
> 2009/5/29 Nicolas Pitre <nico@xxxxxxx>:
> > However, like I said previously, I'd encapsulate large blobs in a pack
> > right away instead of storing them as loose objects. ??The reason is that
> > you can effortlessly repack/fetch/push them afterwards by simply
> > triggering the pack data reuse code path for them. ??Extracting large and
> > undeltified blobs from a pack is just as easy as from a loose object.
> 
> Makes sense. And the code looks nice too.
> 
> > To accomplish that, you only need to copy write_pack_file() from
> > builtin-pack-objects.c and strip it to the bone with only one object to
> > write.
> 
> write_pack_file() is too scary to me, I ripped from fast-import.c
> instead. BTW, how does git handle hundreds of single object packs? I
> don't know if prepare_packed_git scales in such cases.

Yea, its not going to do that great.

We may be able to improve that code path by sorting any pack whose
index is really small and pack file is really big to the end of
the list, where its least likely to be matched, so we don't even
bother to load the index into memory during normal commit traversal.

But even with that sorting, its still going to suck.  Lookup for
a large binary is O(N), where N is the number of large binary
*revisions*.  Yuck.

Really, objects in the 200MB+ range probably should just be in a lone
file named by its SHA-1... aka, a loose object.  Combining them into
a pack is going to be potentially expensive disk IO wise, and may
not gain you very much (its over 200 MB compressed with deflate, its
likely already compressed binary content that may not delta well).

Way back we had that pack-style loose object format, for exactly
these sorts of files, and exactly to avoid having many packs of
just 1 object, but that didn't go anywhere... indeed, Nico deleted
the code that creates that format.

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]