Re: If you would write git from scratch now, what would you change?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Nov 26, 2007 8:58 PM, Nicolas Pitre <nico@xxxxxxx> wrote:
> On Mon, 26 Nov 2007, Shawn O. Pearce wrote:
> > - Loose objects storage is difficult to work with
> >
> >   The standard loose object format of DEFLATE("$type $size\0$data")
> >   makes it harder to work with as you need to inflate at least
> >   part of the object just to see what the hell it is or how big
> >   its final output buffer needs to be.
>
> It is a bit cumbersome indeed, but I'm afraid we're really stuck with it
> since every object SHA1 depends on that format.

Yes,  now I remember: this was the same argument you used to
convince me that losing the "new" (deprecated) loose format was OK.

However,  if we changed
WRITE(DEFLATE(SHA1("$type $size\0$data")))
(where SHA1(x) = x but has the side-effect of updating the SHA-1)
to
WRITE($pack_style_object_header)
SHA1("$type $size\0")
WRITE(DEFLATE(SHA1($data)))
then the SHA-1 result is the same but we get the pack-style header,
and blobs can be sucked straight into packs when not deltified.
The SHA-1 result is still usable at the end to rename the temporary
loose object file
(and put it in the correct xx subdirectory).

Because we can't change the SHA-1 result we unfortunately can
never drop the 2nd call above [this is something that could
have been different, to respond to the email that started this thread].
You didn't like the duplication between the 1st and 2nd call,
but I can't say I see that as a big deal.

> >   It also makes it very hard to stream into a packfile if you have
> >   determined its not worth creating a delta for the object (or no
> >   suitable delta base is available).
> >
> >   The new (now deprecated) loose object format that was based on
> >   the packfile header format simplified this and made it much
> >   easier to work with.
>
> Not really.  Since separate zlib compression levels for loose objects
> and packed objects were introduced, there was a bunch of correctness
> issues.  What do you do when both compression levels are different?
> Sometimes ignore them, sometimes not? Because the default loose object
> compression level is about speed and the default pack compression level
> is about good space reduction, the correct thing to do by default would
> have been to always decompress and recompress anyway when copying an
> otherwise unmodified loose object into a pack.
Not exactly.  I did think about this.  When you are packing to stdout,
and only sending the resulting packfile locally,  you don't want to
bother with recompressing everything.  [This is the "workgroup" case
that concerns me.]  Other cases,  sure,
recompression could help (e.g., packing to a file means the file
will probably be around for a while,  so you want to recompress
if the levels are unequal;  and you probably want to recompress
as well if the packfile will be sent over a "slow" link).

Thanks,
-- 
Dana L. How  danahow@xxxxxxxxx  +1 650 804 5991 cell
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux