Re: If you would write git from scratch now, what would you change?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Nov 26, 2007 11:52 AM, Nicolas Pitre <nico@xxxxxxx> wrote:
> On Mon, 26 Nov 2007, Dana How wrote:
> > Currently data can be quickly copied from pack to pack,
> > but data cannot be quickly copied blob->pack or pack->blob
> I don't see why you would need the pack->blob copy normally.
True,  but that doesn't change the main point.

> > (there was an alternate blob format that supported this,
> >  but it was deprecated).  Using the pack format for blobs
> > would fix this.
>
> Then you can do just that for big enough blobs where "big enough" is
> configurable: encapsulate them in a pack instead of a loose object.
> Problem solved.  Sure you'll end up with a bunch of packs containing
> only one blob object, but given that those blobs are so large to be a
> problem in your work flow when written out as loose objects, then they
> certainly must be few enough not to cause an explosion in the number of
> packs.
Are you suggesting that "git add" create a new pack containing
one blob when the blob is big enough?  Re-using (part of) the pack format
in a blob (or maybe only some blobs) seems like less code change.

> > It would also mean blobs wouldn't need to
> > be uncompressed to get the blob type or size I believe.
>
> They already don't.
It looks like sha1_file.c:parse_sha1_header() works on a buffer
filled in by sha1_file.c:unpack_sha1_header() by calling inflate(), right?

It is true you don't have to uncompress the *entire* blob.

> > The equivalent operation in git would require the creation of
> > the blob,  and then of a temporary pack to send to the server.
> > This requires 3 calls to zlib for each blob,  which for very
> > large files is not acceptable at my site.
>
> I currently count 2 calls to zlib, not 3.
I count 3:

Call 1: git-add calls zlib to make the blob.

Call 2: builtin-pack-objects.c:write_one() calls sha1_file.c:read_sha1_file()
calls :unpack_sha1_file() calls :unpack_sha1_{header,rest}() calls
inflate() to get the data from the blob into a buffer.

Call 3: Then write_one() calls deflate to make the new buffer
to write into the pack.  This is all under the "if (!to_reuse) {" path,
which is active when packing a blob.

Remember,  I'm comparing "p4 submit file" to
"git add file"/"git commit"/"git push",  which is the comparison
the users will be making.

On the other hand,  I'm looking at code from June;
but I haven't noticed big changes since then on the list.

Calls 2 and 3 go away if the blob and pack formats were more similar.
-- 
Dana L. How  danahow@xxxxxxxxx  +1 650 804 5991 cell
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux