Re: If you would write git from scratch now, what would you change?

"Shawn O. Pearce" <spearce@xxxxxxxxxxx> · Mon, 26 Nov 2007 20:48:04 -0500

Jakub Narebski <jnareb@xxxxxxxxx> wrote:
> If you would write git from scratch now, from the beginning, without 
> concerns for backwards compatibility, what would you change, or what 
> would you want to have changed?

- Sort tree entries by name, *not* by name+type

  This has got to be my biggest gripe with Git.  I think Linus really
  screwed the pooch with this.  We've talked it over a few times
  on the list and he and I have just agreed to disagree on this.

  Ask any database person and they'll tell you how wrong the
  current tree ordering is.  Or they are nuts and don't get
  the concept of data integrity.

  Linus' excuse is that the current ordering makes working with
  the flat index faster as its just one index file.  That doesn't
  mean that the flat index file can't contain tree information.
  Like it does in say that new fangled cache-tree extension.  :-)

  This particular "design decision" has brought all sorts of bugs
  into the system, like the D/F merge conflict issues, and even one
  from Linus himself when he first introduced the submodule support.
  Lets not even talk about ugly that made things in jgit.

- Loose objects storage is difficult to work with

  The standard loose object format of DEFLATE("$type $size\0$data")
  makes it harder to work with as you need to inflate at least
  part of the object just to see what the hell it is or how big
  its final output buffer needs to be.

  It also makes it very hard to stream into a packfile if you have
  determined its not worth creating a delta for the object (or no
  suitable delta base is available).

  The new (now deprecated) loose object format that was based on
  the packfile header format simplified this and made it much
  easier to work with.

- No proper libgit

  Already been stated but we don't have a great library and we
  don't have a good way to build one right now either.  A lot of
  our internal code assumes die() will abort the process.  That's a
  very bad assumption to be making inside of a library.

- Binary packed-refs representation

  I probably wouldn't have done an ASCII based packed-refs file,
  or heck, even loose refs.  I probably would have just gone with
  a binary file that we wholesale rewrite every time there is any
  sort of ref update.

  We already do this with the index.  So every time we update a
  file path we are rewriting the entire index.  And we update
  file paths a heck of a lot more often than we update branch
  heads.  Or tags.

  But tools like for-each-ref get invoked heavily, and fast access
  to the ref database is important to overall performance.

- No GIT_OBJECT_DIRECTORY vs. GIT_DIR distinction

  This is causing problems with $GIT_DIR/objects/info/alternates
  and then try to repack repositories.  Not having the ref space of
  the alternates and/or borrowers considered during repacking can
  cause all sorts of fun breakage that may be hard to recover from.
  Plus it means you have to do funny "refs/forkee" hacks just to
  avoid pushing unnecessary objects over the wire when the other
  end is borrowing objects.

  I probably would have had the object directory unified with its
  ref database, so that they cannot be accessed individually.

All of the above is written with 20/20 hindsight and all that.

Looking back (and knowing myself well) I think the only item I
would have gotten right if I had written Git from scratch is the
first one above (the tree entry ordering).  I probably would have
done something equally "as bad" as what we have today for all of
the others...

-- 
Shawn.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html