Re: If you would write git from scratch now, what would you change?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Mon, 26 Nov 2007, Nicolas Pitre wrote:

> On Mon, 26 Nov 2007, Shawn O. Pearce wrote:
> 
> > - Loose objects storage is difficult to work with
> > 
> >   The standard loose object format of DEFLATE("$type $size\0$data")
> >   makes it harder to work with as you need to inflate at least
> >   part of the object just to see what the hell it is or how big
> >   its final output buffer needs to be.
> 
> It is a bit cumbersome indeed, but I'm afraid we're really stuck with it 
> since every object SHA1 depends on that format.

No. 

The SHA1 itself just depends on "$type $size\0$data" (no deflate phase), 
and that one is easy and cheap to calculate. How we then *encode* the data 
on disk is totally immaterial.

In fact, pack-files obviously do not encode it in that form at all, they 
in fact use two different forms of "$binaryhdr$DEFLATE($data)" or 
"$binaryhdr$basesha$DEFLATE($delta)" (that's from memory, so don't rely on 
that).

So we could easily change the on-disk format, and we obviously have - the 
alternate (but deprecated) format for unpacked objects already did. In 
fact, we could - and probably should - add some kind of "back end 
interface" for alternate encoding formats, in case somebody wants to do 
something really crazy like use a database for object tracking.

(Side note: using an actual database would really be insane. There is 
absoluely zero point. But what *could* be interesting would be to have a 
"cluster back-end" for the git object store, where objects get hashed to 
different nodes. If you have a really fast network, it may actually be 
beneficial to spread the objects out, and get better disk throughput by 
that kind of strange "git object RAID-0 striping" setup)

		Linus

(*) Honesty in advertising: the really *original* format did the SHA1 
after the deflate, but that was quickly fixed and was a really stupid 
choice. The main point for doing that was that it meant that loose objects 
could be verified by just running "sha1sum" on them, and comparing the 
result with their name.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux