On Sat, 12 May 2012, Nguyễn Thái Ngọc Duy wrote: > git usually streams large blobs directly to packs. But there are cases > where git can create large loose blobs (unpack-objects or hash-object > over pipe). Or they can come from other git implementations. > core.bigfilethreshold can also be lowered down and introduce a new > wave of large loose blobs. > > Use streaming interface to read these blobs and compress/write at the > same time. > > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> Comments below. > --- > index-pack's streaming support is on the way. unpack-objects is > another story because I'm thinking of merging it back to index-pack > first, which may take more than one release cycle. > > builtin/pack-objects.c | 73 ++++++++++++++++++++++++++++++++++++++++++++---- > t/t1050-large.sh | 16 ++++++++++ > 2 files changed, 83 insertions(+), 6 deletions(-) > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index 1861093..98b51c1 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -259,9 +309,14 @@ static unsigned long write_object(struct sha1file *f, > if (!to_reuse) { > no_reuse: > if (!usable_delta) { > - buf = read_sha1_file(entry->idx.sha1, &type, &size); > - if (!buf) > - die("unable to read %s", sha1_to_hex(entry->idx.sha1)); > + type = sha1_object_info(entry->idx.sha1, &size); Please don't use sha1_object_info() lightly. This is a potentially expensive operation, and you really don't want to do it on each objects. And as a matter of fact, the information you are looking for has already been determined earlier. See the code in check_object() which tries hard to avoid sha1_object_info() as much as possible. Therefore you should have entry->type and entry->size already set for you to use. Nicolas