Jakub Narebski <jnareb@xxxxxxxxx> wrote: > If you would write git from scratch now, from the beginning, without > concerns for backwards compatibility, what would you change, or what > would you want to have changed? - Sort tree entries by name, *not* by name+type This has got to be my biggest gripe with Git. I think Linus really screwed the pooch with this. We've talked it over a few times on the list and he and I have just agreed to disagree on this. Ask any database person and they'll tell you how wrong the current tree ordering is. Or they are nuts and don't get the concept of data integrity. Linus' excuse is that the current ordering makes working with the flat index faster as its just one index file. That doesn't mean that the flat index file can't contain tree information. Like it does in say that new fangled cache-tree extension. :-) This particular "design decision" has brought all sorts of bugs into the system, like the D/F merge conflict issues, and even one from Linus himself when he first introduced the submodule support. Lets not even talk about ugly that made things in jgit. - Loose objects storage is difficult to work with The standard loose object format of DEFLATE("$type $size\0$data") makes it harder to work with as you need to inflate at least part of the object just to see what the hell it is or how big its final output buffer needs to be. It also makes it very hard to stream into a packfile if you have determined its not worth creating a delta for the object (or no suitable delta base is available). The new (now deprecated) loose object format that was based on the packfile header format simplified this and made it much easier to work with. - No proper libgit Already been stated but we don't have a great library and we don't have a good way to build one right now either. A lot of our internal code assumes die() will abort the process. That's a very bad assumption to be making inside of a library. - Binary packed-refs representation I probably wouldn't have done an ASCII based packed-refs file, or heck, even loose refs. I probably would have just gone with a binary file that we wholesale rewrite every time there is any sort of ref update. We already do this with the index. So every time we update a file path we are rewriting the entire index. And we update file paths a heck of a lot more often than we update branch heads. Or tags. But tools like for-each-ref get invoked heavily, and fast access to the ref database is important to overall performance. - No GIT_OBJECT_DIRECTORY vs. GIT_DIR distinction This is causing problems with $GIT_DIR/objects/info/alternates and then try to repack repositories. Not having the ref space of the alternates and/or borrowers considered during repacking can cause all sorts of fun breakage that may be hard to recover from. Plus it means you have to do funny "refs/forkee" hacks just to avoid pushing unnecessary objects over the wire when the other end is borrowing objects. I probably would have had the object directory unified with its ref database, so that they cannot be accessed individually. All of the above is written with 20/20 hindsight and all that. Looking back (and knowing myself well) I think the only item I would have gotten right if I had written Git from scratch is the first one above (the tree entry ordering). I probably would have done something equally "as bad" as what we have today for all of the others... -- Shawn. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html