Re: [PATCH] fast-import: Stream very large blobs directly to pack

"Shawn O. Pearce" <spearce@xxxxxxxxxxx> · Fri, 29 Jan 2010 07:22:54 -0800

Junio C Hamano <gitster@xxxxxxxxx> wrote:
> "Shawn O. Pearce" <spearce@xxxxxxxxxxx> writes:
> 
> > +static void stream_blob(
> > +	uintmax_t len,
> > +	unsigned char *sha1out,
> > +	uintmax_t mark)
> 
> A funny way to indent and line wrap...

Consistent with something else in the same file that has a similar
task... store_object().  I literally just copied the lines from
there and pasted here.

> > +	/* Determine if we should auto-checkpoint. */
> > +	if ((pack_size + 60 + len) > max_packsize
> > +		|| (pack_size + 60 + len) < pack_size)
> > +		cycle_packfile();
> 
> What's "60" in this math?

IIRC 60 is 20 bytes for the SHA-1 footer, and another 40 padding
for the object header, and the deflate() wrapping.

Again, the 60 was stolen from the existing store_object(), which
already has the same assumption.  Only there I think we have len as
the fully compressed version, so the deflate() wrapping is already
being accounted for.

> If the data is not compressible, we could even grow and the end result
> might be more than (pack_size + len), busting max_packsize.

Yea, that's a good point.  I probably should run the length through
deflateBound() and use that here in the test.  But I didn't think it
would make a huge difference.  If the file isn't compressible what
is the real increment over the uncompressed size?  There's a header
and footer, and instructions in the stream to indicate literal data,
but its not like its doubling in size.

> As we are
> streaming out, we cannot say "oops, let me try again after truncating and
> closing the current file and then opening a new file", and instead may
> have to copy the data from the current one to a new one, and truncate the
> current one.  Is this something worth worrying about?

Hmm.  I'm not that worried about it, but then there's the case of
a blob that is larger than the max pack size.  We can't store that,
period.  Should we try to exceed max pack size for that one object?
Or should we refuse?

The major reason for this test is to ensure an object offset
starts before the 4 GiB boundary so idx v1 can hold the offset.
A much less important reason is to try to support older 32 bit
filesystems which can't do more than 2 GiB per file.

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html